Arjen Lentz is a former Community Relations Manager at MySQL. He is currently the owner of Open Query and lives in Brisbane, Australia.
By Arjen Lentz, MySQL AB
5 May 2004
In Part Three of this series we will compare key aspects of the most popular storage engines, and provide you with guidelines and practical examples to help you decide which engine may be best suited for a particular purpose.
MyISAM | InnoDB | MEMORY | NDB | ||
---|---|---|---|---|---|
Multi-statement transactions, ROLLBACK | - | X | - | X | |
Foreign key constraints | - | X | - | - | |
Locking level | table | row | table | row | |
BTREE indexes | X | X | - | X | |
FULLTEXT indexes | X | - | - | - | |
HASH lookups | - | X | X | X | |
Other in-memory tree-based index | - | - | 4.1.0 | - | |
GIS, RTREE indexes | 4.1.0 | - | - | - | |
Unicode | 4.1.0 | 4.1.2 | - | - | |
Merge (union views) | X | - | - | - | |
Compress read-only storage | X | - | - | - | |
Relative disk use | low | high | - | low | |
Relative memory use | low | high | low | high |
As you can see from the comparison table, if you need to use certain features, you may only have a single option. So in that case, you would choose that storage engine for one particular table. But that certainly doesn't mean that you should use that same table type for the entire application! On the contrary, if you make a generic choice, either for all your work or for an entire application, chances are high that the result will not be optimal and perhaps require much more disk space or operate slower than anticipated.
It is very important to realize that there is no single answer. Any "which is best" question inevitably needs to make assumptions and trade-offs. The proper way to go about this is to ask yourself "which storage engine is best suited for ...", for each part of your application. It often makes perfect sense to use different engines within a single application. For instance InnoDB for key account and transactional data, MyISAM for logs, MEMORY tables for quick calculations and intermediate summaries, and NDB for high availability applications that require ultimate performance, scalability, and fast failover.
Transactions are very powerful, but if you use them when it's not required,
it needlessly makes your application more complicated. I already mentioned the
INSERT ... SELECT
and CREATE TABLE ... SELECT
constructs.
If you insert a row, and needs it the generated id for subsequent inserts into other tables:
CREATE TABLE customers (custid INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, name CHAR(40) UNIQUE); CREATE TABLE sales (salesid INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, custid INT UNSIGNED NOT NULL, prodid INT UNSIGNED NOT NULL); INSERT INTO customers VALUES (NULL,'Some Company'); SET @custid = LAST_INSERT_ID(); INSERT INTO sales VALUES (NULL,@custid,1234); ...
The value returned by LAST_INSERT_ID()
is unique for this
connection. A construct using MAX()
would not be safe without
using manual locking to prevent other connections from inserting at the same
time. In the above example, we use a server variable (which is also connection
specific). We do this because the sales table also has an auto-increment column
and we might still need the customer ID for a third insert. Of course, if
there's only a single other insert, you could simplify that statement:
INSERT INTO sales VALUES (NULL,LAST_INSERT_ID(),1234);
This is how you can create a counter without using transactions or locking:
CREATE TABLE counters (name CHAR(10) PRIMARY KEY, counter INT UNSIGNED); INSERT INTO TABLE counters (name) VALUES ('webhits');
Now we want to increment this counter and retrieve the new value, with a single atomic instruction. This means we won't need a transaction, or manual locking. We do need to take care, because another connection might do the same operation, and each connection should of course get its own unique count without any possibility for mistakes.
UPDATE counters SET counter = LAST_INSERT_ID(counter+1) WHERE name = 'webhits'; SELECT LAST_INSERT_ID();
This is a special construct, and again it's perfectly safe without manual locks, no matter how many concurrent connections the server has.
As our last example in this section, a commonly perceived need for a transaction. You we are selling products, and a customer first looks at the amount of available stock:
SELECT amount FROM stock WHERE productid = 1234;
While the customer thinks about his purchase or fills out the forms, someone else might process a purchase of the same product. So do we need to somehow manually lock this entry in the table? No.
UPDATE amount SET stock = stock - 3 WHERE productid = 1234 AND stock >= 3;
If there is less stock than required when this update is executed, the query will report that 0 rows have been updated, instead of 1 row. So without locks and in a single statement, we do the operation in a perfectly safe manner, and get all the information we need.
In web-based applications an SQL transaction cannot span multiple web requests, i.e., you can't start a transaction, issue some queries, then lead the customer through a web form using the retrieved information, and then process the submitted form and complete the transaction. This does not work because web requests (HTTP) are "state-less".
There simply is no continuous connection from the customer's web browser to a server. The next request may in fact arrive at a different web server, and/or at a different database server. A persistent connection in PHP does not help this aspect, as there is no way to guarantee that this user's next request will a) arrive at the same PHP instance and then b) use that connection. The customer could also open an extra browser window and issue other (or duplicate) requests.
You could of course devise elaborate constructs with flag fields in rows to emulate a kind of locking that can be sustained over multiple connections. I would however recommend to design tables and applications in such a way that this is not necessary. The above examples may already give you an idea about how to go about doing this.
A DELETE
statement requires an exclusive lock on a MyISAM
table. If there are a significant number of deletes while simultaneously there
is also a lot of SELECT
traffic, this may impact performance. One
trick that can be used is to turn the delete into an insert!
CREATE TABLE events (eventid INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, title CHAR(40));
Instead of deleting from the events table, interfering with all the selects, we do the following:
CREATE TABLE event_deletes (eventid INT UNSIGNED PRIMARY KEY);
To delete an event:
INSERT INTO event_deletes (329);
Now, to retrieve all non-deleted events:
SELECT e.eventid, e.title FROM events e LEFT JOIN event_deletes ed ON e.eventid = ed.eventid WHERE ed.eventid IS NULL;
or with a subquery (MySQL 4.1 and above):
SELECT e.eventid,e.title FROM events e WHERE NOT EXISTS (SELECT * FROM event_deletes ed WHERE ed.eventid = e.eventid);
These SELECT
statements merely use the index on the
eventid
column in the event_deletes
table, no row
data needs to be retrieved.
During a maintenance timeslot, a script can go through the deleted items and
delete the actual rows from the main table. In MySQL 4.0, this can be done very
easily using just one multi-table DELETE
statement:
DELETE FROM events,event_deletes USING events e,event_deletes ed WHERE e.eventid = ed.eventid;
A general guideline could be as follows: if you require multi-statement transactions, advanced isolation levels and row-level locking, foreign key constraints, or otherwise have a requirement for ACID features, go for InnoDB. Otherwise, simply use MyISAM, the default.
If you have a message board application with lots of selects, inserts as well as updates, InnoDB is probably the generally appropriate choice for the actual message board tables.
The concurrency control mechanism used by MyISAM works very well with tables
which are accessed mostly using SELECT
statements, but also with
concurrent INSERT
s. Whether an INSERT
can be
concurrent depends on whether there is also free space in the table (from
DELETE
s, or in case of dynamic-row format also certain
UPDATE
s). MySQL will first fill the freespace, for space
efficiency. If however, there is no freespace available, new rows are appended
to the end of the physical table file, and this is when an INSERT
can operate concurrently with SELECT
s. UPDATE
and
DELETE
statements always needs to issue an exclusive lock on a
MyISAM table. Please note the trick mentioned earlier for dealing with
DELETE
s.
If you need to do some high speed operations on fairly limited amount of data, use a MEMORY table. You can create this table and easily copy data from a disk-based table as follows:
CREATE TABLE memtable ENGINE=MEMORY SELECT * FROM disktable;
Currently only MyISAM supports GIS (geometrical data, mapping) with RTREE indexes.
FULLTEXT indexing is currently only supported in MyISAM, but depending on your requirements it may be possible to use a simple keyword index in another table type, or even keep the text files on disk and index them using a different tool.
MySQL Cluster is the choice in the following situations:
Not all data need necessarily go into a database. For instance, it generally does not make any sense at all to store image data (photos, etc) in a database table. Rather, you would simply store a reference to the file in a table. The exceptions to this guideline are uses with replication, and for simpler backups.
So, sometimes the question "Which table type should I use?" can be answered with the answer "none at all, put it in the file system". Relational databases and file systems have different strengths, and each should be used appropriately.
As you see, there are many aspects to consider when looking at table types. There can be multiple options, and sometimes a trade-off decision needs to be made. We hope that this article will have given you a better overview of what storage facilities are available to you in a MySQL server, and that the information helps to guide you in your future design choices.