A key determinant of the efficiency of compression in
reducing the size of data files is the nature of the data
itself. Recall that compression works by identifying repeated
strings of bytes in a block of data. Completely randomized
data is the worst case. Typical data often has repeated
values, and so compresses effectively. Character strings
often compress well, whether defined in
CHAR
, VARCHAR
,
TEXT
or BLOB
columns.
On the other hand, tables containing mostly binary data
(integers or floating point numbers) or data that is
previously compressed (for example JPEG or
PNG images) may not generally compress
well, significantly or at all.
Compression is chosen on a table by table basis with the
InnoDB Plugin, and a table and all of its indexes use the same
(compressed) page size. It might be that the primary key
(clustered) index, which contains the data for all columns of
a table, compresses more effectively than the secondary
indexes. For those cases where there are long rows, the use
of compression may result in long column values being stored
“off-page”, as discussed in
Section 5.3, “DYNAMIC
Row Format”. Those overflow pages
may compress well. Given these considerations, for many
applications, some tables compress more effectively than
others, and you may find that your workload performs best only
with a subset of tables compressed.
Experimenting is the only way to determine whether or
not to compress a particular table. InnoDB compresses data
in 16K chunks corresponding to the uncompressed page size, and
in addition to user data, the page format includes some
internal system data that is not compressed. Compression
utilities compress an entire stream of data, and so may find
more repeated strings across the entire input stream than
InnoDB would find in a table compressed in 16K chunks. But
you can get a sense of how compression efficiency by
using a utility that implements LZ77 compression (such as
gzip
or WinZip) on your data file.
Another way to test compression on a specific table is
to copy some data from your uncompressed table to a similar,
compressed table (having all the same indexes) and look at the
size of the resulting file. When you do so (if nothing else
using compression is running), you can
examine the ratio of successful
compression operations to overall compression operations. (In
the INNODB_CMP
table, compare
COMPRESS_OPS
to
COMPRESS_OPS_OK
. See
INNODB_CMP
for more
information.) If a high percentage of compression operations
complete successfully, the table might be a good candidate for
compression.
This is the User’s Guide for InnoDB Plugin 1.0.6 for MySQL 5.1, generated on March 4, 2010 (rev 673:680M).