The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:
Connecting: (3)
Sending query to server: (2)
Parsing query: (2)
Inserting row: (1 × size of row)
Inserting indexes: (1 × number of indexes)
Closing: (1)
This does not take into consideration the initial overhead to open tables, which is done once for each concurrently running query.
The size of the table slows down the insertion of indexes by log
N
, assuming B-tree indexes.
You can use the following methods to speed up inserts:
If you are inserting many rows from the same client at the
same time, use INSERT
statements with multiple VALUES
lists to
insert several rows at a time. This is considerably faster
(many times faster in some cases) than using separate
single-row INSERT
statements.
If you are adding data to a nonempty table, you can tune the
bulk_insert_buffer_size
variable to make data insertion even faster. See
Section 5.1.3, “Server System Variables”.
If multiple clients are inserting a lot of rows, you can get
higher speed by using the INSERT
DELAYED
statement. See
Section 12.2.5.2, “INSERT DELAYED
Syntax”.
For a MyISAM
table, you can use
concurrent inserts to add rows at the same time that
SELECT
statements are
running, if there are no deleted rows in middle of the data
file. See Section 7.3.3, “Concurrent Inserts”.
When loading a table from a text file, use
LOAD DATA
INFILE
. This is usually 20 times faster than using
INSERT
statements. See
Section 12.2.6, “LOAD DATA INFILE
Syntax”.
With some extra work, it is possible to make
LOAD DATA
INFILE
run even faster for a
MyISAM
table when the table has many
indexes. Use the following procedure:
Optionally create the table with
CREATE TABLE
.
Execute a FLUSH
TABLES
statement or a mysqladmin
flush-tables command.
Use myisamchk --keys-used=0 -rq
/path/to/db/tbl_name
.
This removes all use of indexes for the table.
Insert data into the table with
LOAD DATA
INFILE
. This does not update any indexes and
therefore is very fast.
If you intend only to read from the table in the future, use myisampack to compress it. See Section 13.1.3.3, “Compressed Table Characteristics”.
Re-create the indexes with myisamchk -rq
/path/to/db/tbl_name
.
This creates the index tree in memory before writing it
to disk, which is much faster that updating the index
during LOAD
DATA INFILE
because it avoids lots of disk
seeks. The resulting index tree is also perfectly
balanced.
Execute a FLUSH
TABLES
statement or a mysqladmin
flush-tables command.
LOAD DATA
INFILE
performs the preceding optimization
automatically if the MyISAM
table into
which you insert data is empty. The main difference between
automatic optimization and using the procedure explicitly is
that you can let myisamchk allocate much
more temporary memory for the index creation than you might
want the server to allocate for index re-creation when it
executes the LOAD
DATA INFILE
statement.
You can also disable or enable the nonunique indexes for a
MyISAM
table by using the following
statements rather than myisamchk. If you
use these statements, you can skip the
FLUSH TABLE
operations:
ALTER TABLEtbl_name
DISABLE KEYS; ALTER TABLEtbl_name
ENABLE KEYS;
To speed up INSERT
operations
that are performed with multiple statements for
nontransactional tables, lock your tables:
LOCK TABLES a WRITE; INSERT INTO a VALUES (1,23),(2,34),(4,33); INSERT INTO a VALUES (8,26),(6,29); ... UNLOCK TABLES;
This benefits performance because the index buffer is
flushed to disk only once, after all
INSERT
statements have
completed. Normally, there would be as many index buffer
flushes as there are INSERT
statements. Explicit locking statements are not needed if
you can insert all rows with a single
INSERT
.
To obtain faster insertions for transactional tables, you
should use START
TRANSACTION
and
COMMIT
instead of
LOCK TABLES
.
Locking also lowers the total time for multiple-connection tests, although the maximum wait time for individual connections might go up because they wait for locks. Suppose that five clients attempt to perform inserts simultaneously as follows:
Connection 1 does 1000 inserts
Connections 2, 3, and 4 do 1 insert
Connection 5 does 1000 inserts
If you do not use locking, connections 2, 3, and 4 finish before 1 and 5. If you use locking, connections 2, 3, and 4 probably do not finish before 1 or 5, but the total time should be about 40% faster.
INSERT
,
UPDATE
, and
DELETE
operations are very
fast in MySQL, but you can obtain better overall performance
by adding locks around everything that does more than about
five successive inserts or updates. If you do very many
successive inserts, you could do a LOCK
TABLES
followed by an
UNLOCK
TABLES
once in a while (each 1,000 rows or so) to
allow other threads access to the table. This would still
result in a nice performance gain.
INSERT
is still much slower
for loading data than
LOAD DATA
INFILE
, even when using the strategies just
outlined.
To increase performance for MyISAM
tables, for both
LOAD DATA
INFILE
and INSERT
,
enlarge the key cache by increasing the
key_buffer_size
system
variable. See Section 7.5.3, “Tuning Server Parameters”.
MySQL Enterprise. For more advice on optimizing the performance of your server, subscribe to the MySQL Enterprise Monitor. Numerous advisors are dedicated to monitoring performance. For more information, see http://www.mysql.com/products/enterprise/advisors.html.
User Comments
"If you are inserting a lot of rows from different clients, you can get higher speed by using the INSERT DELAYED statement."
I am running an application which inserts 20 to 120 rows per second on a single table (Format: Fixed/Row size: 46/No free data) using 5 to 10 client threads. I was using INSERT DELAYED but the delayed insert thread was consuming a lot of CPU and every 5 seconds all the insert threads would go into "waiting for handler lock" status for about 600 ms. The application also runs also selects and updates in the following proportion:
INSERT: 28%
SELECT: 28%
UPDATE: 44%
During that 600ms every 5 seconds, mysql didn't output anything to any thread (I checked it by monitoring network traffic).
Today I changed the INSERT DELAYED to a simple INSERT and CPU usage went down by 70% and no more intermitent locks are experienced.
Platform: Dual Intel Pentium III - Linux 2.4.20-13.7smp - MySQL 4.0.12-standard. Queries per second avg: 482.485
The suggestion of wrapping in transactions if you make more than about 5 modifications is in my experience a bit high.
I've seen massive performance increases in just wrapping 3 inserts. I imagine the performance gain is whenever you make more than a single insertion.
Summary
--tmpdir (and pointing it at a large partition with plenty of free space)
may be useful if you have a small /tmp partition and are using myisamchk on a large table.
Details
Got a table with 36 million rows, 3 columns.
mysql> select count(*) from pagelinks;
mysql> desc pagelinks;
which I populated using
LOAD DATA INFILE '/home/murray/tagging/wikipedia/tabbed.en_pagelinks.dat' INTO TABLE wikipedia.pagelinks;
# ls -ltrh tabbed.en_pagelinks.dat
-rw-r--r-- 1 murray murray 875M Nov 1 14:03 tabbed.en_pagelinks.dat
when running
myisamchk -r -q /var/lib/mysql/wikipedia/pagelinks
eventually causes multiple
myisamchk: Disk is full writing '/tmp/ST71pY9X' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs
df -m /tmp shows
/dev/hda7 1786 1786 0 100% /tmp
but there is nothing actually in /tmp
# cd /
# du -sm tmp
1 tmp/
Fix was to use --tmpdir and point it at a partition with more than 2Gb of space.
myisamchk -r -q --tmpdir=/home/data/tmp /var/lib/mysql/wikipedia/pagelinks
# mysql -V
mysql Ver 14.7 Distrib 4.1.11, for pc-linux-gnu (i386)
debian stable/sarge
using LOCK TABLE .. WRITE may speed up the insertion, but when processing as well as loading rows, you will lock out any reader. A select on primary key (which should take close to 0 seconds) takes up to 0.2 seconds when my loader is busy loading batches of 1000 rows.
A very good workaround is to put the new rows into a temp table, then flush that table every now and then in a single tansaction. This makes the insert very quick, because all the checking and conversion was already done in the temp table.
CREATE TEMPORARY TABLE ..._temp (....) ENGINE=MEMORY;
-- Loop from here --
DELETE FROM .._temp;
INSERT INTO .._temp ... VALUES (...);
INSERT INTO .._temp ... VALUES (...);
LOCK TABLE ... WRITE, ..._temp READ;
INSERT INTO ... SELECT * FROM ..._temp;
UNLOCK TABLES;
-- Repeat until done --
DROP TABLE ..._temp;
What I've seen in my system, is that the INSERTs go at about the same speed as when locking the table, but the clients do not notice any performance loss at all when the loader is running.
LOCK TABLES do not appear to speed up INSERT...SELECT queries. (At least under MySQL 4.0)
For example...
LOCK TABLES Folders READ, FolderPerms WRITE;
INSERT INTO FolderPerms
SELECT 'asc' as SID,
FolderID1.ID,Folders.InstID,Folders.Name,
Folders.FolderPath FROM FolderID1
LEFT JOIN Folders ON FolderID1.ID=Folders.ID GROUP BY FolderID1.ID;
UNLOCK TABLES;
...runs just as fast with or without the LOCK/UNLOCK statements.
If you have a situation where there are a lot of INSERTs taking place for an extended period of time (such as with a large import), edit your /etc/fstab to turn off access times with the noatime option for the partition that the data is located in. With access times enabled for the mounted partition (the default behavior), the OS has to write to the disk partition every time you access the database (even if it's just a read). On large imports, this can clog up your I/O buffers. This was a problem I had on a FreeBSD 5.4 machine. When I turned off access time, INSERT performance increased severalfold.
An example /etc/fstab entry without, and then with noatime, consult your manpages for OS-specific settings:
/dev/da0s1a /partition ufs rw 2 2
/dev/da0s1a /partition ufs rw,noatime 2 2
If you have a big ammount of data to insert through a select ... insert query, it might be helpfull to divide it into smaller portions. I had a table with approx 30 000 000 records with 6 fields of the type double. If I try to use a select ... insert to copy all records at once into another empty table, it wouldn't go at all. But when copying 2000000 records at a time it took about 10 minutes for the whole table.
It seems that mysql executes a select completely and then starts inserting.
Be aware that if you are use replication and set sync_binlog = 1, this can have a large impact on the speed of INSERT/UPDATE/DELETE operations especially if you are using mysqlimport or LOAD DATA LOCAL INFILE.
Using sync_binlog=1 on Solaris 10, I noted a speed decrease to 25% of the speed as compared to the default sync_binlog = 0.
Add your own comment.