About the Author

Arjen Lentz is a former Community Relations Manager at MySQL. He is currently the owner of Open Query and lives in Brisbane, Australia.

We are always looking for interesting articles about MySQL! Have you written something and would like to it published here? Please contact us via feedback form »

MySQL Storage Engine Architecture, Part 3: Details and Comparison

By Arjen Lentz, MySQL AB
5 May 2004

In Part Three of this series we will compare key aspects of the most popular storage engines, and provide you with guidelines and practical examples to help you decide which engine may be best suited for a particular purpose.

Comparison

	MyISAM	InnoDB	MEMORY	NDB
Multi-statement transactions, ROLLBACK	-	X	-	X
Foreign key constraints	-	X	-	-
Locking level	table	row	table	row
BTREE indexes	X	X	-	X
FULLTEXT indexes	X	-	-	-
HASH lookups	-	X	X	X
Other in-memory tree-based index	-	-	4.1.0	-
GIS, RTREE indexes	4.1.0	-	-	-
Unicode	4.1.0	4.1.2	-	-
Merge (union views)	X	-	-	-
Compress read-only storage	X	-	-	-
Relative disk use	low	high	-	low
Relative memory use	low	high	low	high

As you can see from the comparison table, if you need to use certain features, you may only have a single option. So in that case, you would choose that storage engine for one particular table. But that certainly doesn't mean that you should use that same table type for the entire application! On the contrary, if you make a generic choice, either for all your work or for an entire application, chances are high that the result will not be optimal and perhaps require much more disk space or operate slower than anticipated.

It is very important to realize that there is no single answer. Any "which is best" question inevitably needs to make assumptions and trade-offs. The proper way to go about this is to ask yourself "which storage engine is best suited for ...", for each part of your application. It often makes perfect sense to use different engines within a single application. For instance InnoDB for key account and transactional data, MyISAM for logs, MEMORY tables for quick calculations and intermediate summaries, and NDB for high availability applications that require ultimate performance, scalability, and fast failover.

Transactions and locking

Transactions are very powerful, but if you use them when it's not required, it needlessly makes your application more complicated. I already mentioned the INSERT ... SELECT and CREATE TABLE ... SELECT constructs.

If you insert a row, and needs it the generated id for subsequent inserts into other tables:

  CREATE TABLE customers
                   (custid INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
                    name CHAR(40) UNIQUE);
  CREATE TABLE sales
                   (salesid INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
                    custid INT UNSIGNED NOT NULL,
                    prodid INT UNSIGNED NOT NULL);
  INSERT INTO customers VALUES (NULL,'Some Company');
  SET @custid = LAST_INSERT_ID();
  INSERT INTO sales VALUES (NULL,@custid,1234);
  ...

The value returned by LAST_INSERT_ID() is unique for this connection. A construct using MAX() would not be safe without using manual locking to prevent other connections from inserting at the same time. In the above example, we use a server variable (which is also connection specific). We do this because the sales table also has an auto-increment column and we might still need the customer ID for a third insert. Of course, if there's only a single other insert, you could simplify that statement:

  INSERT INTO sales VALUES (NULL,LAST_INSERT_ID(),1234);

This is how you can create a counter without using transactions or locking:

  CREATE TABLE counters (name CHAR(10) PRIMARY KEY,
                        counter INT UNSIGNED);
  INSERT INTO TABLE counters (name) VALUES ('webhits');

Now we want to increment this counter and retrieve the new value, with a single atomic instruction. This means we won't need a transaction, or manual locking. We do need to take care, because another connection might do the same operation, and each connection should of course get its own unique count without any possibility for mistakes.

  UPDATE counters
     SET counter = LAST_INSERT_ID(counter+1)
   WHERE name = 'webhits';

  SELECT LAST_INSERT_ID();

This is a special construct, and again it's perfectly safe without manual locks, no matter how many concurrent connections the server has.

As our last example in this section, a commonly perceived need for a transaction. You we are selling products, and a customer first looks at the amount of available stock:

  SELECT amount FROM stock WHERE productid = 1234;

While the customer thinks about his purchase or fills out the forms, someone else might process a purchase of the same product. So do we need to somehow manually lock this entry in the table? No.

  UPDATE amount
     SET stock = stock - 3
   WHERE productid = 1234 AND stock >= 3;

If there is less stock than required when this update is executed, the query will report that 0 rows have been updated, instead of 1 row. So without locks and in a single statement, we do the operation in a perfectly safe manner, and get all the information we need.

Web Applications

In web-based applications an SQL transaction cannot span multiple web requests, i.e., you can't start a transaction, issue some queries, then lead the customer through a web form using the retrieved information, and then process the submitted form and complete the transaction. This does not work because web requests (HTTP) are "state-less".

There simply is no continuous connection from the customer's web browser to a server. The next request may in fact arrive at a different web server, and/or at a different database server. A persistent connection in PHP does not help this aspect, as there is no way to guarantee that this user's next request will a) arrive at the same PHP instance and then b) use that connection. The customer could also open an extra browser window and issue other (or duplicate) requests.

You could of course devise elaborate constructs with flag fields in rows to emulate a kind of locking that can be sustained over multiple connections. I would however recommend to design tables and applications in such a way that this is not necessary. The above examples may already give you an idea about how to go about doing this.

DELETE or INSERT?

A DELETE statement requires an exclusive lock on a MyISAM table. If there are a significant number of deletes while simultaneously there is also a lot of SELECT traffic, this may impact performance. One trick that can be used is to turn the delete into an insert!

 CREATE TABLE events
        (eventid INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
         title CHAR(40));

Instead of deleting from the events table, interfering with all the selects, we do the following:

  CREATE TABLE event_deletes (eventid INT UNSIGNED PRIMARY KEY);

To delete an event:

  INSERT INTO event_deletes (329);

Now, to retrieve all non-deleted events:

  SELECT e.eventid, e.title
    FROM events e LEFT JOIN event_deletes ed ON e.eventid = ed.eventid
   WHERE ed.eventid IS NULL;

or with a subquery (MySQL 4.1 and above):

  SELECT e.eventid,e.title
    FROM events e
   WHERE NOT EXISTS
    (SELECT * FROM event_deletes ed WHERE ed.eventid = e.eventid);

These SELECT statements merely use the index on the eventid column in the event_deletes table, no row data needs to be retrieved.

During a maintenance timeslot, a script can go through the deleted items and delete the actual rows from the main table. In MySQL 4.0, this can be done very easily using just one multi-table DELETE statement:

  DELETE FROM events,event_deletes
   USING events e,event_deletes ed
   WHERE e.eventid = ed.eventid;

Choices, Choices...

A general guideline could be as follows: if you require multi-statement transactions, advanced isolation levels and row-level locking, foreign key constraints, or otherwise have a requirement for ACID features, go for InnoDB. Otherwise, simply use MyISAM, the default.

If you have a message board application with lots of selects, inserts as well as updates, InnoDB is probably the generally appropriate choice for the actual message board tables.

The concurrency control mechanism used by MyISAM works very well with tables which are accessed mostly using SELECT statements, but also with concurrent INSERTs. Whether an INSERT can be concurrent depends on whether there is also free space in the table (from DELETEs, or in case of dynamic-row format also certain UPDATEs). MySQL will first fill the freespace, for space efficiency. If however, there is no freespace available, new rows are appended to the end of the physical table file, and this is when an INSERT can operate concurrently with SELECTs. UPDATE and DELETE statements always needs to issue an exclusive lock on a MyISAM table. Please note the trick mentioned earlier for dealing with DELETEs.

If you need to do some high speed operations on fairly limited amount of data, use a MEMORY table. You can create this table and easily copy data from a disk-based table as follows:

  CREATE TABLE memtable ENGINE=MEMORY SELECT * FROM disktable;

Currently only MyISAM supports GIS (geometrical data, mapping) with RTREE indexes.

FULLTEXT indexing is currently only supported in MyISAM, but depending on your requirements it may be possible to use a simple keyword index in another table type, or even keep the text files on disk and index them using a different tool.

MySQL Cluster is the choice in the following situations:

If you require main-memory speed with ACID transactions and redundancy.
If you want to use multiple MySQL servers and have synchronous replication.
If you need telecom-grade performance and 99.999% availability.
If you need sub-second failover.

Or no table at all?

Not all data need necessarily go into a database. For instance, it generally does not make any sense at all to store image data (photos, etc) in a database table. Rather, you would simply store a reference to the file in a table. The exceptions to this guideline are uses with replication, and for simpler backups.

So, sometimes the question "Which table type should I use?" can be answered with the answer "none at all, put it in the file system". Relational databases and file systems have different strengths, and each should be used appropriately.

Conclusion

As you see, there are many aspects to consider when looking at table types. There can be multiple options, and sometimes a trade-off decision needs to be made. We hope that this article will have given you a better overview of what storage facilities are available to you in a MySQL server, and that the information helps to guide you in your future design choices.

About the Author

Related Pages:

MySQL Storage Engine Architecture, Part 3: Details and Comparison

MySQL Storage Engine Architecture, Part 3: Details and Comparison

Comparison

Transactions and locking

Web Applications

DELETE or INSERT?

Choices, Choices...

Or no table at all?

Conclusion