Robin Schumacher is MySQL's Director of Product Management and has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines. Before joining MySQL, Robin was Vice President of Product Management at Embarcadero Technologies.
In my first article on partitioning, I demonstrated that you can get some pretty impressive performance benefits when you properly design partitioned tables to match the SQL queries that are routinely shot across the bow of your database server. I’d now like to continue in that vein and show you how easy and efficient partitioning is in terms of managing how partitions are created and removed within tables.
One thing DBAs love about partitioning is that it gives them much finer control over the underlying storage and data contained within tables and indexes. This control can make a DBA’s life a lot easier when it comes to things like managing logical separations of data as well as removing obsolete information from a database.
Suppose we have the following MyISAM table below that’s populated with 8 million rows:
mysql> desc t1; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | c1 | int(11) | YES | | NULL | | | c2 | varchar(30) | YES | | NULL | | | c3 | date | YES | | NULL | | +-------+-------------+------+-----+---------+-------+
The following query samples part of the data in the table (which is not indexed) and produces the following results:
mysql> select count(*) from t1 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 805114 | +----------+ 1 row in set (41.11 sec)
Not bad for scanning 8 million rows, but you wonder if partitioning could help the situation. The good news is you can very easily alter your MyISAM table in place to become partitioned. Some RDBMS systems require the creation of a new partitioned table, which is then populated with the data from the non-partitioned clone, with the final step being the dropping of the original table. Not so in MySQL – you can execute one simply DDL statement to begin your partitioning test:
mysql> alter table t1 -> partition by range (year(c3)) -> (PARTITION p5 VALUES LESS THAN (2000), -> PARTITION p10 VALUES LESS THAN (2010)); Query OK, 8100000 rows affected (50.28 sec) Records: 8100000 Duplicates: 0 Warnings: 0 mysql> select count(*) from t1 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 805114 | +----------+ 1 row in set (20.42 sec)
We choose to partition by RANGE (one of five possible partitioning methods in MySQL 5.1) and opt to create two initial partitions to segment data by the date column. A re-issue of the original table scan query shows a reduction of about 50% in total response time.
If you find that you’ve partitioned your tables in a way that isn’t perhaps optimal, there’s no need to worry as MySQL includes a number of easy-to-use partition management functions to change things. For example, if you find that you have many queries issuing requests only for data within the year 1995, you can massage your partitioned table to enable even better response time. The REORGANIZE command within the ALTER TABLE statement allows you to do a number of things to existing partitions:
To create a partition that contains only data for 1995, you simply issue the following command:
mysql> alter table t1 -> reorganize partition p5 into -> (PARTITION p1 VALUES LESS THAN (1996), -> PARTITION p5 VALUES LESS THAN (2000)); Query OK, 0 rows affected (47.55 sec) mysql> select count(*) from t1 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 805114 | +----------+ 1 row in set (4.56 sec)
Now you have a partitioned table that’s more optimally tuned for the everyday queries that are sent up against it.
Another great feature of partitioning is the ease at which it allows you to quickly and efficiently remove obsolete data from your system. To demonstrate how partitioning is an improvement over standard data cleansing practices, imagine you have two identically structured tables – t1 and t2 – with the only difference being t1 is partitioned in the manner described in the previous section and t2 is just a standard MyISAM table. Let’s imagine that data from 1995 is no longer needed by the supported application. To remove the 1995 data from the MyISAM table, you issue a basic DELETE command as so:
mysql> delete from t2 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; Query OK, 805114 rows affected (47.41 sec)
Now, to remove the same data from our partitioned table, all we have to do is drop the partition that holds the data for 1995:
mysql> alter table t1 drop partition p1; Query OK, 0 rows affected (0.03 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from t1 where -> c3 > date '1995-01-01' and c3 < date '1995-12-31'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (4.11 sec)
Because the DROP PARTITION command is DDL in nature, it executes in the blink of an eye and greatly outpaces the standard DELETE command. And in case you’re wondering, no – the creation of an index on the date column in the non-partitioned MyISAM table didn’t help the DELETE run any faster (at least on my test Linux box).
There are many more management options available to you on partitioning tables and indexes, so rather than go into all of them here, I’ll just point you to our online reference manual at http://dev.mysql.com/doc/refman/5.1/en/partitioning.html. Also, don’t forget about our MySQL forums, which have a special section devoted to partitioning at http://forums.mysql.com/list.php?106.
Most importantly, make sure you download a copy of MySQL 5.1 (which is now in beta) today and give partitioning a try. And do let me know how you’re using partitioning today as well as partitioning enhancements you’d like to see in upcoming versions of MySQL.
Thanks, as always, for supporting MySQL!