Robin Schumacher is MySQL's Director of Product Management and has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines. Before joining MySQL, Robin was Vice President of Product Management at Embarcadero Technologies.
Every now and then I get a big reminder of how great it is to be a part of the open source way of doing things. Having a strong community of MySQL lovers who use the free and open source MySQL database server and pay MySQL AB with code contributions and bug reports is truly a great way to develop rock solid and innovative software. Some contributions that come back to MySQL are small, but others are very large indeed. What would be large, you ask? How about a new, powerful transactional storage engine?
Well, Paul McCullagh from PrimeBase (www.primebase.com) has done just that in creating the PBXT transactional storage engine for MySQL. PBXT is an ACID compliant transaction engine that offers all the right features for MySQL users who are on the hunt for an engine to keep their transactional data safe. PBXT sports crash recovery, unlimited row-level locking, MVCC, foreign keys/server-enforced referential integrity, automatic storage management, minimal configuration setup, and online backup support. And if that doesn’t get your attention, did I mention that PBXT (at least from some simple testing I’ve done) provides very good performance too?
I’ve been watching PBXT come along for some time now, but it’s only recently that I was able to actually get my hands dirty with the engine. And I have to say that I very much like what I see. Let’s first go through a quick overview of how PBXT is architected, the benefits it provides, and then dive in and take it for a test ride inside the MySQL server.
PBXT is a full ACID compliant transaction engine that offers most everything you’ll need for managing critical transactional data. From an architecture perspective, PBXT looks like the following:
PBXT has some traditional aspects to its architecture, but also makes some interesting departures. If you want a more detailed understanding of PBXT internals, then I recommend checking out the engine’s white paper on the Primebase web site (http://www.primebase.com/xt/). I’ll provide just a brief description of each PBXT component here:
The PBXT engine does everything in its power to efficiently process physical I/O operations. It never does “in place” I/O but instead writes everything sequentially and writes everything only once, which bypasses the traditional write-to-the-log-then-the-database operation. Since log file data is written immediately, records in the cache are never dirty (i.e. must be flushed before being freed), and this fact greatly speeds up/simplifies memory cache management. This also means that rollbacks are extremely fast as no real changes ever have to be undone.
PBXT’s design allows the system to perform instant commits and rollbacks. PBXT tracks transactions in the transaction log files. There are three types of log entries: BEGIN, MODIFY, and END. The ID of the transaction is an 8-byte number that is generated by a counter in memory. A BEGIN record is written when the first update occurs and contains the transaction ID. The MODIFY records reference the new record versions in the data files. Finally, the END record indicates if the transaction was committed or aborted. By scanning the log, the sweeper can quickly determine which record versions can be removed after a transaction ends.
To complete a transaction, PBXT just writes an end record. The rest of the system reacts instantly to this information. In particular, if a transaction is committed, record versions that are already present on the variation chain of a record become instantly visible to other transactions. This is due to the fact that the reader checks the transaction status of each version as it moves down the version chain.
With an overview of PBXT out of the way, let’s now take the engine for a test drive inside MySQL.
One of the great new things in MySQL 5.1 is the enhanced ability to add and remove new engines to and from the MySQL database server. This enhancement makes it a snap to try out the latest beta of the PBXT storage engine.
To get started, you’ll need to download the latest binary plug-in from http://www.primebase.com/xt/download/pbxt-0.9.8-plugins.tar.gz. Paul makes the source available as well as binaries for a number of platforms. In my case, I took the binary download and extracted the files onto my Fedora Linux box. I created a new directory under my MySQL 5.1.15 installation/lib directory called “mysql”. I then copied the 32 bit ‘libpbxt.so’ file into that directory so MySQL would be aware of the new PBXT plugin.
After that, I logged into MySQL and checked what engines I currently had available (and yes, I could also do a SHOW ENGINES and get the same data, but the output is too wide for me to use here) :
mysql> select engine,support,transactions -> from information_schema.engines -> order by 1; +------------+----------+--------------+ | engine | support | transactions | +------------+----------+--------------+ | ARCHIVE | ENABLED | NO | | BLACKHOLE | ENABLED | NO | | CSV | ENABLED | NO | | FEDERATED | ENABLED | YES | | InnoDB | ENABLED | YES | | MEMORY | ENABLED | NO | | MRG_MYISAM | ENABLED | NO | | MyISAM | ENABLED | NO | | ndbcluster | DISABLED | YES | +------------+----------+--------------+ 9 rows in set (0.00 sec)
To enable the PBXT engine, all I need to do is issue one simple command:
mysql> install plugin pbxt soname 'libpbxt.so'; Query OK, 0 rows affected (0.05 sec) mysql> select engine,support,transactions -> from information_schema.engines -> order by 1; +------------+----------+--------------+ | engine | support | transactions | +------------+----------+--------------+ | ARCHIVE | ENABLED | NO | | BLACKHOLE | ENABLED | NO | | CSV | ENABLED | NO | | FEDERATED | ENABLED | YES | | InnoDB | ENABLED | YES | | MEMORY | ENABLED | NO | | MRG_MYISAM | ENABLED | NO | | MyISAM | ENABLED | NO | | ndbcluster | DISABLED | YES | | PBXT | ENABLED | YES | +------------+----------+--------------+ 10 rows in set (0.00 sec)
That’s all there is to it – PBXT is now ready to go. The unique pluggable storage engine architecture of MySQL really makes things nice in this sense.
Regarding basic transaction management, there are no surprises with respect to how PBXT performs and functions. Standard COMMIT
and ROLLBACK
commands work as you’d expect, as does the standard MySQL way of setting auto or non-auto commit (via the SET AUTOCOMMIT=0 | 1
command at the mysql client prompt).
When it comes to concurrency controls, PBXT offers unlimited row-level locking as well as MVCC. In my book, this is great as I’m a big fan of MVCC-based databases. I fell in love with MVCC long ago when I transferred from mainframe DB2 to Oracle, which was one of the first to offer “snapshot read” (MVCC) which means readers never blocked writers and vice-versa. In addition to PBXT, the InnoDB, solidDB, and Falcon engines all use MVCC inside MySQL.
Let’s watch PBXT’s MVCC design in action. We’ll first create a new PBXT table, insert some data, turn off autocommit, and then update a row of data:
mysql> create table t (c1 int) engine=pbxt; Query OK, 0 rows affected (0.00 sec) mysql> insert into t values (1),(2),(3); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> select * from t; +------+ | c1 | +------+ | 1 | | 2 | | 3 | +------+ 3 rows in set (0.00 sec) mysql> set autocommit=0; Query OK, 0 rows affected (0.00 sec) mysql> update t set c1=4 where c1=3; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0
Now let’s log in with a second session and see what we can do:
Enter password: ******* Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 to server version: 5.1.15-beta Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> use hr Database changed mysql> select * from t; +------+ | c1 | +------+ | 1 | | 2 | | 3 | +------+ 3 rows in set (0.00 sec) mysql> update t set c1=5 where c1=3;
The SELECT
query completed with no problem as we would expect with MVCC. Also as expected, our UPDATE
operation hangs because our first session has the desired row locked. Now, here’s where PBXT differs from InnoDB. With PBXT, if I issue a COMMIT
in my first session, my second session will receive an error back:
mysql> update t set c1=5 where c1=3; ERROR 1020 (HY000): Record has changed since last read in table 't'
This is also what the Falcon engine does. With InnoDB, no error would have been issued, but you would have had zero rows updated. Personally, I like PBXT’s (and Falcon’s) method better. If I had rolled back my transaction in the first session, my UPDATE
in the second session would have gone through. One last thing: currently, there is no timeout in PBXT for cases of lock contention, so a session will wait indefinitely.
With respect to deadlock detection, PBXT does a nice job of handling these types of potential bottlenecks:
*** SESSION 1 *** mysql> select * from t; +------+ | c1 | +------+ | 1 | | 4 | | 5 | +------+ 3 rows in set (0.00 sec) mysql> set autocommit=0; Query OK, 0 rows affected (0.00 sec) mysql> update t set c1=6 where c1=1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 *** SESSION 2 *** mysql> select * from t; +------+ | c1 | +------+ | 1 | | 4 | | 5 | +------+ 3 rows in set (0.00 sec) mysql> set autocommit=0; Query OK, 0 rows affected (0.00 sec) mysql> update t set c1=7 where c1=4; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update t set c1=8 where c1=1; Query OK, 1 row affected (18.84 sec) Rows matched: 1 Changed: 1 Warnings: 0 *** SESSION 1 *** mysql> update t set c1=9 where c1=4; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Many transactional applications require the presence of server-enforced referential integrity to guarantee that the data relationships between child and parent tables remain clean. Both InnoDB and the solidDB engines support foreign key constraints that keep referential data intact, and now we can add PBXT to the list of MySQL storage engines that support server-enforced RI.
Let’s use the infamous EMP and DEPT tables to illustrate how PBXT manages foreign key constraints:
mysql> CREATE TABLE dept -> (deptid int NOT NULL DEFAULT '0', -> deptname varchar(20), -> PRIMARY KEY pk_deptid (deptid)) -> ENGINE=PBXT; Query OK, 0 rows affected (0.02 sec) mysql> CREATE TABLE emp -> (empid int NOT NULL DEFAULT '0', -> deptid int NOT NULL DEFAULT '0', -> PRIMARY KEY pk_empid (empid), -> KEY emp_deptid (deptid), -> CONSTRAINT FK_emp_1 FOREIGN KEY (deptid) REFERENCES dept (deptid)) -> ENGINE=PBXT; Query OK, 0 rows affected (0.00 sec) mysql> insert into dept values -> (1,'HR'),(2,'IT'),(3,'FINANCE'); Query OK, 3 rows affected (0.02 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> insert into emp values -> (1,1),(2,2),(3,3); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0
Now let’s try and add some a bad child row and see what PBXT does:
mysql> insert into emp values -> (4,4); ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (Constraint: `FK_emp_1`)
Great – PBXT kept out the bad data. What about trying to delete parent rows that reference child data?
mysql> delete from dept -> where deptid = 2; ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (Constraint: `FK_emp_1`)
PBXT denies the deletion of parent data, which is what we hoped to see (since the default mode of RESTRICT for foreign key operations is being enforced). We can change the foreign key operation to cascade parent DELETEs (and UPDATEs) if we want by just changing the constraint definition:
mysql> ALTER TABLE emp -> DROP FOREIGN KEY FK_emp_1; Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE emp ADD CONSTRAINT FK_emp_1 FOREIGN KEY FK_emp_1 (deptid) -> REFERENCES dept (deptid) -> ON DELETE CASCADE -> ON UPDATE RESTRICT, -> ENGINE = PBXT; Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> delete from dept -> where deptid = 2; Query OK, 1 row affected (0.00 sec) mysql> select * from dept; +--------+----------+ | deptid | deptname | +--------+----------+ | 1 | HR | | 3 | FINANCE | +--------+----------+ 2 rows in set (0.00 sec) mysql> select * from emp; +-------+--------+ | empid | deptid | +-------+--------+ | 1 | 1 | | 3 | 3 | +-------+--------+ 2 rows in set (0.00 sec)
One nice thing about PBXT is that, unlike InnoDB, if you want to drop a variety of referentially-grouped tables, you don’t have to drop them in child-parent order. PBXT lets you drop the tables in any order you like and will kick back an error if you try and update an existing child table whose parent has been dropped.
The only other thing worth mentioning on the topic of PBXT and foreign keys is that if you want to convert any existing InnoDB tables with foreign key definitions over to PBXT, you’ll need to drop the foreign key constraints before issuing a ALTER TABLE…ENGINE=
command. Once the tables are transferred to the PBXT engine, you can recreate the FK’s manually via standard DDL or graphically by using the MySQL Administrator tool.
A lot of folks don't know that you can do an online/non-blocking backup with a number of the MySQL transactional engines. The mysqldump utility provides the --single-transaction option, which allows an online backup to be performed that does not block any DML activity (Note: DDL actions are not supported in this fashion and should not be issued during a backup). As of right now, you can backup InnoDB databases using this option and you can do the same for objects created with the PBXT storage engine (Note that the solidDB engine has a special online backup command that it uses).
To test this, I used the MySQL Administrator GUI tool, which makes backup and restore jobs very easy. I selected a large PBXT table to backup and then initiated the backup job:
After the backup job was underway, I went back to a mysql command prompt and issued a number of DML commands:
mysql> update pbxt_idx set description = 'testing online backup update' -> where pkid = 3000; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> delete from pbxt_idx where pkid = 50000; Query OK, 1 row affected (0.00 sec)
The above DML statements issued against the PBXT table that was being backed up completed without any problems whatsoever.
Restoring PBXT databases and tables is no different than restoring data created via any other MySQL storage engine, so there is nothing new you need to learn on that front.
Features are nice and can occasionally dictate what database is used for a particular system, but something that always matters for online database applications is performance. One of MySQL’s hallmarks is fast performance, and happily, PBXT does not disappoint on this front.
To test PBXT performance, I first did some single user DML/query tests using a single non-indexed table cloned between both InnoDB and PBXT. As you can see below, PBXT demonstrated very efficient write and read performance:
mysql> show create table big_table\G *************************** 1. row *************************** Table: big_table Create Table: CREATE TABLE `big_table` ( `client_transaction_id` int(11) NOT NULL DEFAULT '0', `client_id` int(11) NOT NULL DEFAULT '0', `investment_id` int(11) NOT NULL DEFAULT '0', `action` varchar(10) NOT NULL DEFAULT '', `price` decimal(12,2) NOT NULL DEFAULT '0.00', `number_of_units` int(11) NOT NULL DEFAULT '0', `transaction_status` varchar(10) NOT NULL DEFAULT '', `transaction_sub_timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00', `transaction_comp_timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00', `description` varchar(200) DEFAULT NULL, `broker_id` bigint(10) DEFAULT NULL, `broker_commission` decimal(10,2) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> create table t_innodb like big_table; Query OK, 0 rows affected (0.03 sec) mysql> create table t_pbxt like big_table; Query OK, 0 rows affected (0.00 sec) mysql> alter table t_pbxt engine=pbxt; Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> insert into t_innodb select * from big_table; Query OK, 1120500 rows affected (24.30 sec) Records: 1120500 Duplicates: 0 Warnings: 0 mysql> insert into t_pbxt select * from big_table; Query OK, 1120500 rows affected (9.38 sec) Records: 1120500 Duplicates: 0 Warnings: 0 mysql> drop table t_innodb; Query OK, 0 rows affected (0.89 sec) mysql> create table t_innodb like big_table; Query OK, 0 rows affected (0.05 sec) mysql> insert into t_innodb select * from big_table; Query OK, 1120500 rows affected (21.53 sec) Records: 1120500 Duplicates: 0 Warnings: 0 mysql> select count(*) from t_innodb; +----------+ | count(*) | +----------+ | 1120500 | +----------+ 1 row in set (1.06 sec) mysql> select count(*) from t_pbxt; +----------+ | count(*) | +----------+ | 1120500 | +----------+ 1 row in set (7.17 sec) mysql> select count(*) from t_innodb; +----------+ | count(*) | +----------+ | 1120500 | +----------+ 1 row in set (1.08 sec) mysql> select count(*) from t_pbxt; +----------+ | count(*) | +----------+ | 1120500 | +----------+ 1 row in set (3.69 sec) mysql> update t_innodb -> set description = 'testing update of varchar column'; Query OK, 1120500 rows affected (55.94 sec) Rows matched: 1120500 Changed: 1120500 Warnings: 0 mysql> update t_pbxt -> set description = 'testing update of varchar column'; Query OK, 1120500 rows affected (17.08 sec) Rows matched: 1120500 Changed: 1120500 Warnings: 0 mysql> set autocommit=0; Query OK, 0 rows affected (0.00 sec) mysql> delete from t_innodb; Query OK, 1120500 rows affected (21.42 sec) mysql> rollback; Query OK, 0 rows affected (15.27 sec) mysql> delete from t_pbxt; Query OK, 1120500 rows affected (13.44 sec) mysql> rollback; Query OK, 0 rows affected (0.00 sec)
After these simple tests, I next performed a check of PBXT’s performance by using my favorite graphical benchmarking tool – Quest’s Benchmark Factory. Whenever I need to personally test a new or enhanced storage engine, I use this tool because it easily lets me put the MySQL database through its paces with minimal effort on my part.
I did a simple scaleable benchmark test, which tests CPU, read, insert, and update-intensive loads, along with a mixed workload at the end. My test machine was a small single-CPU Linux box running Fedora Core along with the MySQL 5.1.15 beta. For a comparison, I ran the same tests against InnoDB. As you can see below, PBXT held up pretty nicely:
Test | Userload | InnoDB (TPS) | PBXT (TPS) |
CPU Intensive | 1 | 205 | 364 |
8 | 280 | 555 | |
32 | 520 | 1,232 | |
64 | 1,194 | 2,570 | |
Read Intensive | 1 | 854 | 976 |
8 | 2,050 | 2,360 | |
32 | 2,367 | 2,586 | |
64 | 2,842 | 3,259 | |
Update Intensive | 1 | 2,101 | 2,378 |
8 | 7,452 | 1,266 | |
32 | 7,706 | 5,260 | |
64 | 9,231 | 6,157 | |
Insert Intensive | 1 | 1,949 | 1,961 |
8 | 7,207 | 6,793 | |
32 | 6,774 | 5,969 | |
64 | 6,420 | 5,324 | |
Mixed Workload | 1 | 506 | 743 |
8 | 801 | 928 | |
32 | 1,134 | 2,185 | |
64 | 2,016 | 3,155 |
TPS=Transactions Per Second.
For PBXT, there weren’t many parameters to tweak – only the index and data cache to which I set at 100MB each. Note that these parameters are currently set via environment parameters in the O/S you’re using. The parameters for InnoDB were the following:
mysql> show global variables like '%inno%'; +---------------------------------+----------------------------+ | Variable_name | Value | +---------------------------------+----------------------------+ | have_innodb | YES | | innodb_additional_mem_pool_size | 20971520 | | innodb_autoextend_increment | 8 | | innodb_buffer_pool_size | 209715200 | | innodb_checksums | ON | | innodb_commit_concurrency | 0 | | innodb_concurrency_tickets | 500 | | innodb_data_file_path | ibdata1:10M:autoextend | | innodb_data_home_dir | | | innodb_doublewrite | ON | | innodb_fast_shutdown | 1 | | innodb_file_io_threads | 4 | | innodb_file_per_table | OFF | | innodb_flush_method | | | innodb_force_recovery | 0 | | innodb_lock_wait_timeout | 300 | | innodb_locks_unsafe_for_binlog | OFF | | innodb_log_arch_dir | | | innodb_log_archive | OFF | | innodb_log_buffer_size | 8388608 | | innodb_log_file_size | 104857600 | | innodb_log_files_in_group | 2 | | innodb_log_group_home_dir | /usr/local/mysql5114/data/ | | innodb_max_dirty_pages_pct | 90 | | innodb_max_purge_lag | 0 | | innodb_mirrored_log_groups | 1 | | innodb_open_files | 300 | | innodb_support_xa | OFF | | innodb_sync_spin_loops | 20 | | innodb_table_locks | ON | | innodb_thread_concurrency | 8 | | innodb_thread_sleep_delay | 10000 | | innodb_flush_log_at_trx_commit | 2 | +---------------------------------+----------------------------+
Of course, your mileage will vary depending on the hardware, test, etc., that you use for benchmarking, but the above tests do demonstrate that PBXT can certainly hold its own when it comes to performing well under various workloads.
Simply put, I like PBXT. Its transactional feature set is strong, the engine is easy to use and work with, and performance is nothing to sneeze at. PBXT is a nice new addition to the MySQL family of storage engines, and is another great example of open source and the MySQL community in action.
I encourage you to download and try PBXT today. Afterwards, send its creator – Paul McCullagh – a big thank you along with your opinions on the engine and how you’d make it better.