Robin Schumacher is MySQL's Director of Product Management and has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines. Before joining MySQL, Robin was Vice President of Product Management at Embarcadero Technologies.
All robust relational database management systems utilize memory caching to help improve the overall responsiveness of their underlying engine. While certainly not free, accessing application data, query plans, or database metadata in RAM is normally much quicker than repetitively retrieving that same information from disk or building it from scratch. This paper compares the memory caching strategies used by a number of database vendors with MySQL, and then focuses on the MySQL query cache with a discussion of why its unique approach to memory caching can pay handsome performance dividends when correctly used by database administrators and performance analysts.
A detailed discussion of how the popular database vendors implement their caching strategies could easily result in a lengthy work, so we will just quickly look at the subject in a broad way. Generally speaking, there are normally three types of global caches used by RDBMS systems:
There are more memory structures used by databases, but the three mentioned above represent the bulk of a RDBMS system's RAM demands. Working from the bottom up, metadata caches are standard across nearly all relational systems and hold system reference data, with examples being Oracle's dictionary cache (contained within its shared pool), the MySQL table cache, and the MySQL InnoDB storage engine's dictionary cache.
Code caches are used to primarily lessen the impact of "hard parse" activity where, after SQL and stored code statements are initially executed and checked both syntactically and for security, the sometimes costly process of generating access plans is performed. The goal of code caches is to reuse repetitive SQL and procedural code statements so only soft parsing (parsing without access plan generation, other miscellaneous steps, and loading into the cache) is done. The amount of CPU savings from such caches can indeed be very large. Examples include Oracle's shared pool/library cache, and SQL Server's and Sybase's SQL/Procedure caches.
Data caches are the most easily understood memory cache of databases, with the contents being (surprise!), application data. Data blocks are read in from disk and stored in memory using LRU (least recently used) chains or other advanced mechanisms (like Oracle's Touch Count or MySQL's Midpoint Insertion Strategy that is used for MyISAM index blocks) to keep hot blocks always in play. Some database vendors allow a DBA to carve out certain areas of a data cache for specific uses. For example, in addition to a global data cache, Oracle provides a KEEP buffer pool that maintains a table's data in RAM and a RECYCLE buffer pool that prevents the default cache from being flooded with little-referenced data blocks from full table scans that potentially can force out frequently-referenced and needed blocks. Sybase offers a DBA the ability to define multiple data caches and then specifically assign certain objects to them. SQL Server offers a PIN function (DBCC PINTABLE) that pins a table in the global data cache once it is first accessed.
Because MySQL is more than capable of playing on the same field as any other major database vendor, it offers various caches for data that is managed by its storage engines. The MyISAM key cache retains index blocks in RAM for fast access, with both default and custom-created caches being available. Because it so closely mirrors Oracle, the MySQL InnoDB storage engine offers a global buffer cache that holds both data and index blocks, with AWE (Advanced Windowing Extensions) support being available for Windows installations. MySQL also offers the Memory/Heap storage engine, which as its name implies, keeps the contents of a table (and indexes) always in RAM.
Some DBAs and performance analysts feel that a large data cache is the silver bullet for nearly all performance headaches. After all, since memory access is much faster than disk lookups, the key to excellent performance must be to store as much data as possible into memory. After that has been accomplished, then it likely won't matter what SQL is thrown at the system.
To analyze whether a data cache has been correctly configured, a number of ratios have been created that measure logical (memory) to physical (disk) I/O rates. MySQL has the key cache hit ratio and InnoDB buffer pool ratio, Oracle has the buffer cache hit ratio, and Sybase and SQL Server use the data cache hit rate. DBAs have relied on these ratios to help determine if their data caches are large enough, with standard rules of thumb being a 90+% reading being necessary for the cache to be termed 'effective'. If the data cache hit rate hovers near this reading, then many DBAs bless their system as one that is high-performance in nature.
It may shock you to hear this, but nothing could be further from the truth. While it is certainly true that properly configured data caches play a part in a high-performing database system, relying on high data cache hit rates as the sole or primary indicator of an efficiently performing database is bad practice. Why? Consider the following illustration taken from Oracle expert Cary Millsap's excellent paper entitled "Why a 99%+ Database Buffer Cache Hit Ratio is Not OK1":
Two distinct SQL statements A and B return identical row sets, but the two statements have different execution plans. Which statement would you rather have on your system?
SQL Statement | Cache Hit Ratio |
A | 99.99% |
B | 90.00% |
The conventional answer is that one should rather have statement A on one's system because it has the higher (and presumably better) database buffer cache hit ratio … Let's look one level deeper into the details of what workload the statements generate. Given the following bits of additional information, now which statement would rather have on your system?
SQL Statement | Cache Hit Ratio | Logical I/O | Physical I/O | Execution Time |
A | 99.99% | 10,000 | 1 | .405 sec |
B | 90.00% | 10 | 1 | .005 sec |
As Cary points out, logical I/O counts and oftentimes counts a lot. You can't just throw memory at a database that's filled with inefficient SQL and expect things to be OK. The consistent rebuilding of complex results sets from data caches coupled with high latch (memory lock) activity can cause lots of CPU activity that can cripple a system in no time.
Every DBA should have a goal to create and configure their databases so that data is stored and database features exploited in ways where the most common end user requests are returned as fast as possible. As we've seen, you simply can't just blindly assign memory to a database in hopes of making things right, because even if the raw data is in RAM, poorly-written SQL can still peg a server's CPU to recreate complex result sets through logical I/O.
While it is certainly not a cure for poorly written SQL, the MySQL query cache, introduced in 4.0.1, can have an astonishingly positive impact on the response time of both basic and resource-intensive SQL statements. As you might expect, the query cache stores the SELECT
queries issued by clients to the MySQL database server. In this respect, it acts a lot like traditional database SQL caches, in that identically issued queries will be located and re-issued by the database engine without hard parse activity. However, unlike any other database system, MySQL stores not only the SQL query issued by a client, but also stores the query's result set. In this respect, the MySQL query cache provides substantial benefits over other database engine caching strategies because not only is the overhead of hard parsing for identical queries avoided, but the sometimes very high overhead of recreating complex result sets from either disk or memory caches is averted as well, greatly lessening both physical and logical I/O. The end results are extremely fast response times for business applications (data warehouses, BI applications, Web applications, OLTP systems, etc.) where end user clients are repetitively executing the same MySQL queries.
So what situations lend themselves to best exploiting the benefits of the MySQL query cache? While the query cache can be effective in most any situation, the following are more ideal business conditions:
Data warehouse/business intelligence situations, web-based applications, and traditional OLTP systems all qualify on the surface. The query cache looks for identical queries (spacing, upper/lower case, all come into play), ergo the first point above.
Regarding the data being semi-static in nature, because the result set is stored alongside the issuing query, any change in the underlying data of one or any of the data objects involved in the query causes MySQL to invalidate the query in the cache and reload it and the recomputed result set the next time it is issued. Finally, because some queries have the potential to return large volumes of data/rows, MySQL provides a customizable memory storage limit that governs the size of each query's result set. Because of this, the query cache works best for queries that don't return large amounts of data (although you can certainly allow this if you'd like).
Let's next see the MySQL query cache in action to get a feel for how it is practically configured and used.
Before we get started, a quick note: all the tests below were performed on a Dell single processor (with hyperthreading turned on) Pentium 4 3.00 GHz machine with 1GB of RAM, running Red Hat Fedora Core 4 Linux. The version of MySQL used is 5.0.7 beta.
In some installations of MySQL, the query cache is disabled by default so you will have to do a little configuration to set things up. The way to check is to log into MySQL and issue the following command:
mysql> show variables like 'query%'; +------------------------------+---------+ | Variable_name | Value | +------------------------------+---------+ | query_alloc_block_size | 8192 | | query_cache_limit | 1048576 | | query_cache_min_res_unit | 4096 | | query_cache_size | 0 | | query_cache_type | ON | | query_cache_wlock_invalidate | OFF | | query_prealloc_size | 8192 | +------------------------------+---------+
While the query_cache_type
variable is set to ON, the query_cache_size
variable is set to zero, and this effectively disables the query cache. Setting the query_cache_type
variable to ON (caches all SELECT queries except for those with the SQL_NO_CACHE hint) or DEMAND (only caches SELECT queries with the SQL_CACHE hint) along with a query_cache_size
variable something greater than zero enables the query cache.
At the moment, however, the query cache is disabled so let's issue a particular query that computes the top five commissioned brokers at an investment management company a couple of times and see what happens:
mysql> SELECT A.BROKER_ID, -> A.BROKER_FIRST_NAME, -> A.BROKER_LAST_NAME, -> SUM(BROKER_COMMISSION) TOTAL_COMMISSIONS ->FROM BROKER A, -> CLIENT_TRANSACTION B ->WHERE A.BROKER_ID = B.BROKER_ID ->GROUP BY A.BROKER_ID, -> A.BROKER_FIRST_NAME, -> A.BROKER_LAST_NAME ->ORDER BY 4 DESC ->LIMIT 5; +-----------+-------------------+------------------+-------------------+ | BROKER_ID | BROKER_FIRST_NAME | BROKER_LAST_NAME | TOTAL_COMMISSIONS | +-----------+-------------------+------------------+-------------------+ | 20 | STEVE | BOYCE | 3864173.64 | | 1 | JONATHAN | MORTON | 1584621.39 | | 13 | JIM | SANDERS | 1369157.73 | | 4 | DAVE | TUCKER | 1214111.75 | | 14 | DENISE | SCHWARTZ | 1041040.98 | +-----------+-------------------+------------------+-------------------+ 5 rows in set (0.11 sec) < execute query again > +-----------+-------------------+------------------+-------------------+ | BROKER_ID | BROKER_FIRST_NAME | BROKER_LAST_NAME | TOTAL_COMMISSIONS | +-----------+-------------------+------------------+-------------------+ | 20 | STEVE | BOYCE | 3864173.64 | | 1 | JONATHAN | MORTON | 1584621.39 | | 13 | JIM | SANDERS | 1369157.73 | | 4 | DAVE | TUCKER | 1214111.75 | | 14 | DENISE | SCHWARTZ | 1041040.98 | +-----------+-------------------+------------------+-------------------+ 5 rows in set (0.11 sec)
Each result set is returned fairly quickly at .11 seconds, which isn't too bad. However, now let's enable the query cache by setting it to 50M (MySQL allows the dynamic setting of many variables without starting/stopping MySQL) and then see what happens when we execute the same query twice again:
mysql> set global query_cache_size=50000000; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'query%'; +------------------------------+----------+ | Variable_name | Value | +------------------------------+----------+ | query_alloc_block_size | 8192 | | query_cache_limit | 1048576 | | query_cache_min_res_unit | 4096 | | query_cache_size | 49999872 | | query_cache_type | ON | | query_cache_wlock_invalidate | OFF | | query_prealloc_size | 8192 | +------------------------------+----------+ mysql> SELECT A.BROKER_ID, -> A.BROKER_FIRST_NAME, -> A.BROKER_LAST_NAME, -> SUM(BROKER_COMMISSION) TOTAL_COMMISSIONS ->FROM BROKER A, -> CLIENT_TRANSACTION B ->WHERE A.BROKER_ID = B.BROKER_ID ->GROUP BY A.BROKER_ID, -> A.BROKER_FIRST_NAME, -> A.BROKER_LAST_NAME ->ORDER BY 4 DESC ->LIMIT 5; +-----------+-------------------+------------------+-------------------+ | BROKER_ID | BROKER_FIRST_NAME | BROKER_LAST_NAME | TOTAL_COMMISSIONS | +-----------+-------------------+------------------+-------------------+ | 20 | STEVE | BOYCE | 3864173.64 | | 1 | JONATHAN | MORTON | 1584621.39 | | 13 | JIM | SANDERS | 1369157.73 | | 4 | DAVE | TUCKER | 1214111.75 | | 14 | DENISE | SCHWARTZ | 1041040.98 | +-----------+-------------------+------------------+-------------------+ 5 rows in set (0.11 sec) < execute query again > +-----------+-------------------+------------------+-------------------+ | BROKER_ID | BROKER_FIRST_NAME | BROKER_LAST_NAME | TOTAL_COMMISSIONS | +-----------+-------------------+------------------+-------------------+ | 20 | STEVE | BOYCE | 3864173.64 | | 1 | JONATHAN | MORTON | 1584621.39 | | 13 | JIM | SANDERS | 1369157.73 | | 4 | DAVE | TUCKER | 1214111.75 | | 14 | DENISE | SCHWARTZ | 1041040.98 | +-----------+-------------------+------------------+-------------------+ 5 rows in set (0.00 sec) mysql> show status like 'qc%'; +-------------------------+----------+ | Variable_name | Value | +-------------------------+----------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 49988976 | | Qcache_hits | 1 | | Qcache_inserts | 1 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 1 | | Qcache_queries_in_cache | 1 | | Qcache_total_blocks | 5 | +-------------------------+----------+
The first query once again takes .11 seconds to execute, however the second identical query takes literally no computed time at all because it is serviced from the query cache. Checking MySQL's status counters that relate to the query cache confirm this, with the Qcache_inserts
counter showing a value of 1 (indicating that the first query was loaded into the cache) and the Qcache_hits
counter showing a value of 1 indicating one hit/match on an issued query/cached query combination.
The query cache works with raw SQL queries as demonstrated above and also works for queries issued within MySQL stored procedures:
mysql> delimiter // mysql> create procedure test_query_cache() -> begin -> SELECT A.BROKER_ID, -> A.BROKER_FIRST_NAME, -> A.BROKER_LAST_NAME, -> SUM(BROKER_COMMISSION) TOTAL_COMMISSIONS -> FROM BROKER A, -> CLIENT_TRANSACTION B -> WHERE A.BROKER_ID = B.BROKER_ID -> GROUP BY A.BROKER_ID, -> A.BROKER_FIRST_NAME, -> A.BROKER_LAST_NAME -> ORDER BY 4 DESC -> LIMIT 5; -> end -> // Query OK, 0 rows affected (0.00 sec) mysql> delimiter ; mysql> flush status; mysql> reset query cache; Query OK, 0 rows affected (0.00 sec) mysql> call test_query_cache(); +-----------+-------------------+------------------+-------------------+ | BROKER_ID | BROKER_FIRST_NAME | BROKER_LAST_NAME | TOTAL_COMMISSIONS | +-----------+-------------------+------------------+-------------------+ | 20 | STEVE | BOYCE | 3864173.64 | | 1 | JONATHAN | MORTON | 1584621.39 | | 13 | JIM | SANDERS | 1369157.73 | | 4 | DAVE | TUCKER | 1214111.75 | | 14 | DENISE | SCHWARTZ | 1041040.98 | +-----------+-------------------+------------------+-------------------+ 5 rows in set (0.11 sec) mysql> call test_query_cache(); +-----------+-------------------+------------------+-------------------+ | BROKER_ID | BROKER_FIRST_NAME | BROKER_LAST_NAME | TOTAL_COMMISSIONS | +-----------+-------------------+------------------+-------------------+ | 20 | STEVE | BOYCE | 3864173.64 | | 1 | JONATHAN | MORTON | 1584621.39 | | 13 | JIM | SANDERS | 1369157.73 | | 4 | DAVE | TUCKER | 1214111.75 | | 14 | DENISE | SCHWARTZ | 1041040.98 | +-----------+-------------------+------------------+-------------------+ 5 rows in set (0.00 sec) mysql> show status like 'qc%'; +-------------------------+----------+ | Variable_name | Value | +-------------------------+----------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 49988976 | | Qcache_hits | 1 | | Qcache_inserts | 1 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 1 | | Qcache_queries_in_cache | 1 | | Qcache_total_blocks | 5 | +-------------------------+----------+
We first create a stored procedure that issues the same broker commission query we originally used to test the query cache, flush the MySQL status counters to reset the query cache statistics to zero, reset the query cache to remove all queries from the cache, then call the procedure twice to show that MySQL caches the query issued from within the procedure. Note that queries called from views, new in 5.0, work as well. However, SQL statements that utilize input parms from procedures will not make use of the query cache.
There are times when you may have to tweak the MySQL query cache to ensure optimal performance, so let's review some of the more important query cache-related status counters and configuration variables. To begin, the status counter Qcache_free_blocks
indicates the contiguous nature of the memory assigned to the cache. High numbers can indicate fragmentation issues, which may be solved by issuing a FLUSH QUERY CACHE statement. Note that this command does not remove queries from the cache, but coalesces memory free space chunks.
The Qcache_free_memory
counter provides insight into the cache's free memory. Low amounts observed vs. total allocated for the cache may indicate an undersized cache, which can be remedied by altering the global variable query_cache_size
.
Qcache_hits
and Qcache_inserts
shows the number of times a query was serviced from the cache and how many queries have been inserted into the cache. Low ratios of hits to inserts indicate little query reuse or a too-low setting of the query_cache_limit
, which serves to govern the RAM devoted to each individual query cache entry. Large query result sets will require larger settings of this variable.
Another indicator of poor query reuse is an increasing Qcache_lowmem_prunes
value. This indicates how often MySQL had to remove queries from the cache to make use for incoming statements. Other reasons for an increasing number of Qcache_lowmem_prunes
are an undersized cache, which can't hold the needed amount of SQL statements and result sets, and memory fragmentation in the cache which may be alleviated by issuing a FLUSH QUERY CACHE statement. You can remove all queries from the cache with the RESET QUERY CACHE command.
The Qcache_not_cached
counter provides insight into the number of statements executed against MySQL that were not cacheable, due to either being a non-SELECT statement or being explicitly barred from entry with a SQL_NO_CACHE hint.
Other server variables that you can use to tweak the query cache are:
query_alloc_block_size
- the allocation size of the RAM blocks that are allocated for objects in cache.query_cache_min_res_unit
- the minimum size for blocks allocated by the cache.query_cache_wlock_invalidate
- causes the query cache to invalidate any query in the cache if an object it uses has a write lock executed against it.query_prealloc_size
- the size of the persistent buffer used by the cache for parsing and execution. Complex queries necessitate larger settings.Finally, in addition to setting query cache variables that globally govern its use, note that the query cache can be individually managed at the client level. For example, a client can turn off the query cache for their own queries by issuing the statement:
mysql> set session query_cache_type=0;
Of course, there are prerequisites and limitations regarding MySQL query cache usage, with the most important being:
CURRENT_DATE, NOW, RAND
and others, negate the use of the cache.For a complete up to date list of query cache limitations, see the MySQL manual.
The MySQL query cache is a unique caching strategy that is currently not utilized by other database engines, and one that can greatly enhance the performance of most any system that experiences high degrees of read activity. Because both physical and logical I/O activity is all but eliminated by the query cache, even systems that must bear the brunt of inefficient SQL statements can many times perform faster than those on other database platforms.
1 Why a 99%+ Database Buffer Cache Hit Ratio is Not OK, Cary Millsap/Hotsos Enterprises, Ltd., 2001