Robin Schumacher is MySQL's Director of Product Management and has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines. Before joining MySQL, Robin was Vice President of Product Management at Embarcadero Technologies.
Many of you have responded very positively on the first and second partitioning articles that I wrote some time back. It’s clear that a lot of you are excited about the upcoming MySQL 5.1 release, and in particular, the new data partitioning feature that’s being offered. That’s no surprise because, as we covered in the previous two articles, partitioning has some excellent performance and management benefits for anyone who’s building data warehouses or other large databases with MySQL.
However, I’ve gotten some feedback indicating a disappointment that MySQL 5.1 doesn’t support partitioning on a native date/time column. Given that many data warehouses and other large databases try and segment data by some date-based algorithm – especially via the RANGE partitioning style – it’s understandable that a number of you are wondering if partitioning in 5.1 will meet the needs of your particular business intelligence application that relies on time-based functions.
It’s true that in MySQL 5.1 the actual partitioning key (the element used by the MySQL partition engine to horizontally divide the rows across the various partitions defined on a table or index) must be an integer value. But that doesn’t mean you can’t partition by date in MySQL 5.1 – it just requires the proper selection between a couple of MySQL date functions and you’ll be in business.
Let’s take a look at how to properly partition by dates in MySQL 5.1 and also showcase how not to try and partition by date, so you won’t waste time and effort on something that doesn’t give you the results that you’re looking for.
I always appreciate it when someone shows me where not to step, so let’s first see what you don’t want to try when it comes to partitioning by date in MySQL 5.1. If you try and define a table with a partition key that references a date datatype, MySQL will quickly kick back an error:
mysql> create table rms (d date) -> partition by range (d) -> (partition p0 values less than ('1995-01-01'), -> partition p1 VALUES LESS THAN ('2010-01-01')); ERROR 1064 (42000): VALUES value must be of same type as partition function near '), partition p1 VALUES LESS THAN ('2010-01-01'))' at line 3
Because MySQL DBAs and developers are very clever and innovative, the first thing they'll do is try a workaround for the integer-only restriction in MySQL 5.1 regarding the partition key. For example, here's one way to get around the restriction:
mysql> CREATE TABLE part_date1 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> partition by range (cast(date_format(c3,'%Y%m%d') as signed)) -> (PARTITION p0 VALUES LESS THAN (19950101), -> PARTITION p1 VALUES LESS THAN (19960101) , -> PARTITION p2 VALUES LESS THAN (19970101) , -> PARTITION p3 VALUES LESS THAN (19980101) , -> PARTITION p4 VALUES LESS THAN (19990101) , -> PARTITION p5 VALUES LESS THAN (20000101) , -> PARTITION p6 VALUES LESS THAN (20010101) , -> PARTITION p7 VALUES LESS THAN (20020101) , -> PARTITION p8 VALUES LESS THAN (20030101) , -> PARTITION p9 VALUES LESS THAN (20040101) , -> PARTITION p10 VALUES LESS THAN (20100101), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected (0.01 sec)
Bingo! MySQL took it. Here’s another possible workaround:
mysql> CREATE TABLE part_date2 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> partition by range (YEAR(C3)*100+MONTH(C3)) -> (PARTITION p0 VALUES LESS THAN (199501), -> PARTITION p1 VALUES LESS THAN (199601) , -> PARTITION p2 VALUES LESS THAN (199701) , -> PARTITION p3 VALUES LESS THAN (199801) , -> PARTITION p4 VALUES LESS THAN (199901) , -> PARTITION p5 VALUES LESS THAN (200001) , -> PARTITION p6 VALUES LESS THAN (200101) , -> PARTITION p7 VALUES LESS THAN (200201) , -> PARTITION p8 VALUES LESS THAN (200301) , -> PARTITION p9 VALUES LESS THAN (200401) , -> PARTITION p10 VALUES LESS THAN (201001), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected (0.02 sec)
MySQL took that too. At this point, you might be feeling pretty crafty as you’ve outwitted MySQL into partitioning by a date column. But watch what happens when we fill the tables with millions of rows of data and then see how each reacts with a standard range-based query:
mysql> explain partitions -> select count(*) from part_date1 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_date1 partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 8100000 Extra: Using where 1 row in set (0.00 sec) mysql> explain partitions -> select count(*) from part_date2 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_date1 partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 8100000 Extra: Using where 1 row in set (0.00 sec)
Yikes! What happened? MySQL is doing a full table scan rather than just scanning the partition(s) that contain the data we need. The issue is that you’ve forgotten the second part of the equation when it comes to leveraging date-based partitioning (or any partitioning style really), which is the MySQL optimizer. Yes, you can divide your data using any number of date-to-integer conversion techniques, but that doesn’t mean you’ll get the big bang when it comes to using partitioning, which is partition pruning – the act where MySQL only references the partitions needed to fulfill a particular query. The optimizer’s partition pruning mechanism is the actual hero that accomplishes the performance speed-up’s in partitioned vs. non-partitioned tables, and without it, you might as well go back to using non-partitioned tables if it’s an increase in performance you’re looking for.
But take heart – there is a way to partition by date and get partition pruning.
The MySQL optimizer will recognize two date-based functions for partition pruning purposes:
If you partition date columns using either of these functions, you can both define a partitioned table by date and get the benefit of partition pruning from the optimizer (provided, of course, that your query is written in such a way so as to take advantage of the optimizer’s ability to prune partitions). For example, we can redefine the tables used in our previous example using both the TO_DAYS() and YEAR() date-based functions:
mysql> CREATE TABLE part_date3 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> partition by range (to_days(c3)) -> (PARTITION p0 VALUES LESS THAN (to_days('1995-01-01')), -> PARTITION p1 VALUES LESS THAN (to_days('1996-01-01')) , -> PARTITION p2 VALUES LESS THAN (to_days('1997-01-01')) , -> PARTITION p3 VALUES LESS THAN (to_days('1998-01-01')) , -> PARTITION p4 VALUES LESS THAN (to_days('1999-01-01')) , -> PARTITION p5 VALUES LESS THAN (to_days('2000-01-01')) , -> PARTITION p6 VALUES LESS THAN (to_days('2001-01-01')) , -> PARTITION p7 VALUES LESS THAN (to_days('2002-01-01')) , -> PARTITION p8 VALUES LESS THAN (to_days('2003-01-01')) , -> PARTITION p9 VALUES LESS THAN (to_days('2004-01-01')) , -> PARTITION p10 VALUES LESS THAN (to_days('2010-01-01')), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected (0.00 sec) mysql> CREATE TABLE part_date4 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> PARTITION BY RANGE (year(c3)) -> (PARTITION p0 VALUES LESS THAN (1995), -> PARTITION p1 VALUES LESS THAN (1996) , -> PARTITION p2 VALUES LESS THAN (1997) , -> PARTITION p3 VALUES LESS THAN (1998) , -> PARTITION p4 VALUES LESS THAN (1999) , -> PARTITION p5 VALUES LESS THAN (2000) , -> PARTITION p6 VALUES LESS THAN (2001) , -> PARTITION p7 VALUES LESS THAN (2002) , -> PARTITION p8 VALUES LESS THAN (2003) , -> PARTITION p9 VALUES LESS THAN (2004) , -> PARTITION p10 VALUES LESS THAN (2010), -> PARTITION p11 VALUES LESS THAN MAXVALUE ); Query OK, 0 rows affected (0.02 sec)
If we fill both tables with the same amount of data as the tables we created with non-prunable partitioning keys and issue range-based queries against the newly loaded well-partitioned tables, we’ll find a big difference when it comes to what the MySQL optimizer does:
mysql> explain partitions -> select count(*) from part_date3 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_date3 partitions: p1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 808431 Extra: Using where 1 row in set (0.00 sec) mysql> explain partitions -> select count(*) from part_date4 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: part_date4 partitions: p1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 200000 Extra: Using where 1 row in set (0.00 sec)
Notice only the p1 partition is being scanned in these two tables, whereas all partitions were being checked in the tables that had partition keys the MySQL optimizer couldn’t use for partition pruning. A quick test run shows partition pruning in action along with the resulting performance difference:
mysql> select count(*) from part_date3 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 805114 | +----------+ 1 row in set (4.11 sec) mysql> select count(*) from part_date1 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 805114 | +----------+ 1 row in set (40.33 sec)
A 90% decrease in response time for the query referencing table part_date3 over the query requesting data from table part_date1 spells out pretty clearly why it’s important to use date functions that the MySQL optimizer can utilize when defining date-based partitioned tables.
While a restriction does exist in MySQL 5.1 that says partitioned tables must be partitioned via an integer-based value, it doesn’t mean you can’t partition tables by date or get the benefits of partition pruning. You must, however, ensure that you’re using date conversion functions in the defined partitioning key for the table that the MySQL optimizer can make use of.
For more information on partitioning in general, please check out our online reference manual at http://dev.mysql.com/doc/refman/5.1/en/partitioning.html. Also, don’t forget about our MySQL forums, which have a special section devoted to partitioning at http://forums.mysql.com/list.php?106.
Of course I’d be remiss if I didn’t encourage you to download the latest copy of MySQL 5.1 (which is now in beta) today and give partitioning a try. I’d certainly love to hear how you’re using partitioning and other new features in MySQL 5.1.
Thanks, as always, for supporting MySQL!