MySQL Partitioning – A Quick Look at Partitioning – Separate Your Data for Faster Searches

In MySQL, partitioning is a way to separate the data in one table into smaller “sub-tables” for better query performance and data management.

For example, let’s say that you have a database containing numerous accounting transactions. You could just store all of these transactions in one table, but you only need to keep seven year’s worth of data for tax purposes. Instead of placing all of the data in one table, and then deleting the old data from that table, you could split the table into partitions with each partition representing one year’s worth of data.

Then, after seven years, you could delete/drop the old partition. Partitions are flexible, as you can add, drop, redefine, merge, or split existing partitions (there are other options on what you could do with this data as well). Also, if you have a table that is going to contain a lot of rows, partitioning your data would allow your searches to be much faster, as the search can then be limited to a single partition. As of MySQL 5.6, you can split a table into as many as 8192 partitions.

Here is the MySQL website’s explanation about partitions:

The SQL standard does not provide much in the way of guidance regarding the physical aspects of data storage. The SQL language itself is intended to work independently of any data structures or media underlying the schemas, tables, rows, or columns with which it works. Nonetheless, most advanced database management systems have evolved some means of determining the physical location to be used for storing specific pieces of data in terms of the file system, hardware or even both. In MySQL, the InnoDB storage engine has long supported the notion of a tablespace, and the MySQL Server, even prior to the introduction of partitioning, could be configured to employ different physical directories for storing different databases (see Section 8.11.3.1, “Using Symbolic Links“, for an explanation of how this is done).

Partitioning takes this notion a step further, by enabling you to distribute portions of individual tables across a file system according to rules which you can set largely as needed. In effect, different portions of a table are stored as separate tables in different locations. The user-selected rule by which the division of data is accomplished is known as a partitioning function, which in MySQL can be the modulus, simple matching against a set of ranges or value lists, an internal hashing function, or a linear hashing function. The function is selected according to the partitioning type specified by the user, and takes as its parameter the value of a user-supplied expression. This expression can be a column value, a function acting on one or more column values, or a set of one or more column values, depending on the type of partitioning that is used.

(From: https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html)


There are four types of partition options for your data:

RANGE – This type of partitioning assigns rows to partitions based on column values falling within a given range.

LIST – Similar to partitioning by RANGE, except that the partition is selected based on columns matching one of a set of discrete values.

HASH – With this type of partitioning, a partition is selected based on the value returned by a user-defined expression that operates on column values in rows to be inserted into the table. The function may consist of any expression valid in MySQL that yields a nonnegative integer value. An extension to this type, LINEAR HASH, is also available.

KEY – This type of partitioning is similar to partitioning by HASH, except that only one or more columns to be evaluated are supplied, and the MySQL server provides its own hashing function. These columns can contain other than integer values, since the hashing function supplied by MySQL guarantees an integer result regardless of the column data type. An extension to this type, LINEAR KEY, is also available.

(From: https://dev.mysql.com/doc/refman/5.6/en/partitioning-types.html)


This post will just give you one example of how to partition your data, and then how to verify that your query is searching only the correct partition. It doesn’t do you any good if you partition your data but then write queries that perform a table scan to get your results. In this example, I am going to be separating the table data by the year.

We are going to create a simple membership table, and partition by RANGE. We will separate the partition by the year that the person joined and we will add one member to each year. Our members table will be very simple, with an ID, the date the person joined, and their first and last name. We will create the partition by using just the YEAR that they joined, while we keep the full date they joined in the joined column. We are also assigning the columns id and joined to be primary keys. Here is the CREATE TABLE statement:

CREATE TABLE `members` (
  `id` int(5) NOT NULL AUTO_INCREMENT,
  `joined` date NOT NULL,
  `lastname` varchar(25) NOT NULL,
  `firstname` varchar(25) NOT NULL,
  PRIMARY KEY (`id`,`joined`)
) ENGINE=InnoDB AUTO_INCREMENT=10000 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE ( YEAR(joined))
(PARTITION p0 VALUES LESS THAN (2011) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (2012) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (2013) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (2014) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;

Our partitions will contain rows that have joined dates earlier than the dates shown in the PARTITION BY statement. In other words, partition p0 will contain dates earlier than 01/01/2011 (i.e. dates in 2010 or earlier). Partition p2 will contain dates earlier than 01/01/2012 but greater than 12/31/2010 (i.e. dates in 2011). Partition p3 will contains dates for 2013, and p4 will contain dates for 2014 and greater. Before the year 2015 arrives, you will need to add an additional partition for 2015. Of course, you could go ahead and add partitions for the next several years.

If you want the partition p0 to contain all of the dates in 2011 (instead of those dates LESS THAN 2011), you can change the VALUES LESS THAN (2011) statement to VALUES IN (2011). But then 2011 will be the earliest year that your partition would be able to contain. Any values less than 2011 would not be inserted into the database.

Now, let’s insert some data. We will insert one row into each partition, and then do a:

select id, joined, lastname, firstname from members;

to see what our data looks like:

mysql> insert into members (firstname, lastname, joined) values ("Mary", "Davis", "2010-01-14");
Query OK, 1 row affected (0.64 sec)

mysql> insert into members (firstname, lastname, joined) values ("John", "Hill", "2011-02-12");
Query OK, 1 row affected (0.01 sec)

mysql> insert into members (firstname, lastname, joined) values ("Steve", "Johnson", "2012-03-18");
Query OK, 1 row affected (0.01 sec)

mysql> insert into members (firstname, lastname, joined) values ("Beth", "Daniels", "2013-04-22");
Query OK, 1 row affected (0.03 sec)

mysql> insert into members (firstname, lastname, joined) values ("Bob", "Smith", "2014-05-29");
Query OK, 1 row affected (0.01 sec)

mysql> select id, joined, lastname, firstname from members;
+-------+------------+----------+-----------+
| id    | joined     | lastname | firstname |
+-------+------------+----------+-----------+
| 10000 | 2010-01-14 | Davis    | Mary      |
| 10001 | 2011-02-12 | Hill     | John      |
| 10002 | 2012-03-18 | Johnson  | Steve     |
| 10003 | 2013-04-22 | Daniels  | Beth      |
| 10004 | 2014-05-29 | Smith    | Bob       |
+-------+------------+----------+-----------+
5 rows in set (0.00 sec)

When you start building your queries, you want to make sure that the query is using the partitions. You can do this by including the EXPLAIN PARTITIONS statement before your select statement. Visit this link you want to learn more about Obtaining Information About Partitions.

Since we made the id column a primary key, let’s look at what happens when we do a search by primary key. We will use the EXPLAIN PARTITIONS statement to see what partitions are being used in the search. Let’s look for Mary’s information. She has the ID of 10000.

mysql> EXPLAIN PARTITIONS select id, firstname, lastname, joined from members where id = '10000';
+----+-------------+---------+----------------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table   | partitions     | type | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+---------+----------------+------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | members | p0,p1,p2,p3,p4 | ref  | PRIMARY       | PRIMARY | 4       | const |    5 | NULL  |
+----+-------------+---------+----------------+------+---------------+---------+---------+-------+------+-------+
1 row in set (0.05 sec)

As you can see under the partitions column, all five partitions (p0,p1,p2,p3,p4) were searched for this information because the partitions were separated by the year, and not the id. So this query would not take advantage of our partitions.

Look at what happens when we also include Mary’s joined date along with the id column:

mysql> EXPLAIN PARTITIONS select id, firstname, lastname, joined from members where id = '10000' and joined = '2010-01-14';
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref         | rows | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
|  1 | SIMPLE      | members | p0         | const | PRIMARY       | PRIMARY | 7       | const,const |    1 | NULL  |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
1 row in set (0.00 sec)

As you can see, MySQL only had to search in partition p0. Since the joined column was included in the query, MySQL can go to that partition and use the PRIMARY key of id and quickly find the record it needs.

Let’s see what we would need to do if you wanted to find all of the members who joined in the year 2010 (like Mary). You would think that you could just use the YEAR function on the joined column. But, you can’t use a function to convert the joined date to a year, as MySQL will need to convert all of the values in the joined columns first, and then it won’t be able to use the partition:

mysql> EXPLAIN PARTITIONS select id, firstname, lastname, joined from members where YEAR(joined) = '2010';
+----+-------------+---------+----------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table   | partitions     | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+---------+----------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | members | p0,p1,p2,p3,p4 | ALL  | NULL          | NULL | NULL    | NULL |    5 | Using where |
+----+-------------+---------+----------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.03 sec)

In this case, you are still having to go through all partitions because of the YEAR function. It would be better to use a range in the WHERE clause to find the members from 2010:

mysql> EXPLAIN PARTITIONS select id, firstname, lastname, joined from members where joined  '2009-12-31';
+----+-------------+---------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | members | p0         | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

But what happens when you need to change the partitioned value of the joined date? What if Mary’s date was incorrect, and she really joined in 2011? What happens to the data? When you change the value of the partitioned column, MySQL will move that data to the appropriate partition. Let’s look at Mary’s information again, and also look at the EXPLAIN PARTITIONS statement for the same query.

mysql> select id, firstname, lastname, joined from members where id = '10000' and joined = '2010-01-14';
+-------+-----------+----------+------------+
| id    | firstname | lastname | joined     |
+-------+-----------+----------+------------+
| 10000 | Mary      | Davis    | 2010-01-14 |
+-------+-----------+----------+------------+
1 row in set (0.00 sec)

mysql> EXPLAIN PARTITIONS select id, firstname, lastname, joined from members where id = '10000' and joined = '2010-01-14';
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref         | rows | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
|  1 | SIMPLE      | members | p0         | const | PRIMARY       | PRIMARY | 7       | const,const |    1 | NULL  |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
1 row in set (0.00 sec)

We can see that Mary’s data is in partition p0. Now let’s change Mary’s joined date from 2010-01-14 to 2011-05-30, and then run both of the above statements again (but in the query we need to change Mary’s joined date to reflect the new date):

mysql> update members set joined = '2011-05-30' where id = '10000';
Query OK, 1 row affected (0.06 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select id, firstname, lastname, joined from members where id = '10000' and joined = '2011-05-30';
+-------+-----------+----------+------------+
| id    | firstname | lastname | joined     |
+-------+-----------+----------+------------+
| 10000 | Mary      | Davis    | 2011-05-30 |
+-------+-----------+----------+------------+
1 row in set (0.00 sec)

mysql> EXPLAIN PARTITIONS select id, firstname, lastname, joined from members where id = '10000' and joined = '2011-05-30';
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref         | rows | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
|  1 | SIMPLE      | members | p1         | const | PRIMARY       | PRIMARY | 7       | const,const |    1 | NULL  |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------+------+-------+
1 row in set (0.00 sec)

We can now see that Mary’s data is now in partition p1.

Partitioning data can really add performance to your queries, but only if you know how to write the proper queries to take advantage of the partitioning. Using the EXPLAIN PARTITIONS statement can really help you figure out if your queries are properly working. You can also store separate partitions on separate storage devices (by using innodb_file_per_table), and in MySQL 5.7.4 (or greater), you can even move partitioned tables to another server.

 


Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn.
Tony is the author of Twenty Forty-Four: The League of Patriots

 

Visit http://2044thebook.com for more information.

Advertisements