The MySQL query optimizer uses estimated statistics about key distributions to select or avoid using an index in an execution plan, based on the relative selectivity of the index. Previously, InnoDB sampled 8 random pages from an index to get an estimate of the cardinality of (i.e., the number of distinct values in) the index. (This page sampling technique is frequently described as “index dives”.) This small number of page samples frequently was insufficient, and could give inaccurate estimates of an index’s selectivity and thus lead to poor choices by the query optimizer.
To give users control over the quality of the statistics
estimate (and thus better information for the query optimizer),
the number of sampled pages now can be changed via the parameter
innodb_stats_sample_pages
.
This feature addresses user requests such as that as expressed in MySQL Bug#25640: InnoDB Analyze Table Should Allow User Selection of Index Dives.
You can change the number of sampled pages via the global
parameter innodb_stats_sample_pages
, which can be set at
runtime (i.e., it is a dynamic parameter). The default value for
this parameter is 8, preserving the same behavior as in past
releases.
Note that the value of innodb_stats_sample_pages
affects the
index sampling for all tables and indexes.
You should also be aware that there are the following potentially
significant impacts when you change the index sample size:
small values like 1 or 2 can result in very inaccurate estimates of cardinality
values much larger than 8 (say, 100), can cause a big
slowdown in the time it takes to open a table or execute
SHOW TABLE STATUS
.
the optimizer may choose very different query plans based on different estimates of index selectivity
Note that the cardinality estimation can be disabled for
metadata commands such as SHOW TABLE STATUS
by executing the command SET GLOBAL
innodb_stats_on_metadata=OFF
(or 0
).
Before InnoDB Plugin
1.0.2, this variable could only be set in the MySQL option
file (my.cnf
or my.ini
),
and changing it required shutting down and restarting the
server.
The cardinality (the number of different key values) in every
index of a table is calculated when a table is opened,
at SHOW TABLE STATUS
and ANALYZE TABLE
and on other
circumstances (like when the table has changed too much).
Note that all tables are opened, and the statistics are
re-estimated, when the mysql
client starts if the
auto-rehash setting is set on (the default).
The auto-rehash feature enables automatic name completion
of database, table, and column names for interactive
users. You may prefer setting auto-rehash off to improve
the start up time of the mysql
client.
You should note that it does not make sense to increase
the index sample size, then run ANALYZE TABLE
and decrease
sample size to attempt to obtain better statistics.
This is because the statistics are not persistent. They are
automatically recalculated at various times other than on
execution of ANALYZE TABLE
. Sooner or later the
“better” statistics calculated by
ANALYZE
running with a high value of
innodb_stats_sample_pages
will be wiped away.
The estimated cardinality for an index will be more accurate
with a larger number of samples, but each sample might require
a disk read, so you do not want to make the sample size too
large. You should choose a value for innodb_stats_sample_pages
that results in reasonably accurate estimates for all tables in
your database without requiring excessive I/O.
Although it is not possible to specify the sample size on a
per-table basis, smaller tables generally would require fewer
index samples than larger tables require. If your database
has many large tables, you may want to consider using a higher
value for innodb_stats_sample_pages
than if you have mostly
smaller tables.
This is the User’s Guide for InnoDB Plugin 1.0.6 for MySQL 5.1, generated on March 4, 2010 (rev 673:680M).