Amit Saha is a Quality Engineer in the MySQL Engineering Team at Sun Microsystems and works from Bangalore, India. He blogs at http://blogs.sun.com/amitsaha and takes deep interest in exploring new technologies and writing about them.
DTrace is a dynamic tracing facility built into the Solaris and Open Solaris operating systems and can be used by system administrators and developers alike to observe the runtime behavior of user level programs and of the operating system itself. On one hand, DTrace can be used to identify potential bottlenecks in the running processes on a production system and on the other DTrace can help you understand the runtime behavior of an external program such as MySQL better.
Originally available on Solaris, DTrace has now been ported to Mac OSX, FreeBSD and a experimental Linux port is also available. In this article, I shall use OpenSolaris 2008.11 release to demonstrate the working.
The DTrace architecture takes a good look at the various components of the DTrace framework.
The following graphic (reproduced from the DTrace how-to) illustrates the DTrace framework and its various components:
Probes, about which we learn very shortly (and not shown in the figure) can be best visualised as sensors available to be probed by the DTrace consumers in user space.
We shall now learn the basic DTrace concepts which will help us in our quest of some serious playing around with DTrace and MySQL.
Probes, Providers and Consumers
DTrace dynamically modifies the operating system kernel and user processes to record data at locations of interest (or instrumentation points), called probes. The probe is user specified and its specificity and description determine the benefit derived from DTrace. A probe is a location or activity to which the DTrace framework can bind a request to perform actions, such as logging the system calls, the function calls in user level processes, recording a stack trace and so on. A probe is said to fire when the activity, as specified by the probe description takes place. When a probe is fired, the requested action will take place. A probe has the following attributes which identifies it uniquely:
The current probes available on your system can be displayed by pfexec dtrace -l
. By using various switches, it is also possible to display only the probes belonging to, say a particular module. For eg. pfexec dtrace -l m:mysql*
will list all the probes available via the mysql*
provider. (Note the * in mysql*
denotes all modules with names starting with mysql
).
DTrace probes are implemented by kernel modules called providers, each of which performs a particular kind of instrumentation to create probes. Providers can thus described as publishers of probes that can be consumed by DTrace consumers (see below). Providers can be used for instrumenting kernel and user-level code. For user-level code, there are two ways in which probes can be defined- User-Level Statically Defined Tracing (USDT) or PID provider.
In USDT, custom probe points are inserted into application code according to well defined guidelines and practices. Once the custom probe points are integrated and the application code is compiled and the binary is run, the probes become available for consumption by DTrace user level consumers. However, unless they are used, the probes have a zero impact on the performance of the application or the system as a whole.
Does this mean that DTrace cannot be used with applications with no USDT probes defined? No.
The PID provider can be used to probe any user level process, whether USDT probes was defined for it or not. Using the PID provider is a very generic and easy way to play around with DTrace. Code a simple application in your favorite programming language and have fun with DTrace by observing the function call flow, stack trace and lots more. In the later parts of this article, we shall use both of the above for DTracing a running MySQL server.
provider:module:function:name
If one or more of the fields are missing, the specified fields are interpreted in a right to left fashion. That is, if a probe description is given as foo:bar
, the probe description matches all probes with the function foo and the name bar, regardless of the provider or the module. To obtain desired results, it is suggested to specify all the required fields. You may also want to match all the probes published by any given provider, for which you would use the probe description like: fbt:::
, which matches all the probes of the fbt provider.
A DTrace consumer is any process that interacts with the DTrace framework. The consumer specifies the instrumentation points by specifying probe descriptions. dtrace is the primary consumer of the DTrace framework. (Now, you see the difference between DTrace and dtrace?)
DTrace Scripts or D-scripts
Programs or scripts to interact with the DTrace framework are written in the D programming language. A D program source file consists of one or more probe clauses that describe the points of instrumentation to be enabled by DTrace. Each probe clause has the following form:
probe descriptions / predicate / { action statements }
A D program can consist of one or more of such probe clauses. The predicate and the list of action statements are optional and may not be required in some scenarios. D programs are described in detail here. A D program can be executed by specifying it via the '-s' switch to dtrace
or making it an executable (like a shell script) and setting 'dtrace' as the interpreter, by #!/usr/sbin/dtrace
in the script.
As mentioned earlier, there are two ways in which DTrace can be used with any user level process- USDT and PID provider. We shall see demonstrations of both these mechanisms as we start using DTrace with the MySQL server, or specifically 'mysqld'. One thing to note here is that, DTrace one-liners can be used to demonstrate a lot of what we will be doing. But, to make the learning curve easier, we will use D-scripts, however small they may be. Familiarity with the MySQL source code is required to derive the maximum advantage of the rest of the article.
Before I start off with writing D scripts, here are some common points worth noting:
chmod +x
and specifying the location of the script interpreter using a #!
, like: #!/usr/sbin/dtrace
.dtrace
. For eg. you are specifying a D script to dtrace
, you will use the -s
switcharg0, arg1, arg2..
. Other built in variables like timestamp, walltimestamp
are described here.timestamp
gives the current value of a nanosecond counter, which increments from an arbitrary time in the past and is useful for relative time calculation. walltimestamp
, the current number of nanoseconds since UTC, is more suited when Date/Time value is required.copyinstr
is used to copy a the value of a char*
type parameter to a variable in your D script. By default, strings upto a maximum size of 256 can be stored. You can change it using #pragma D option strsize=1024
mysqld
, it is essential that each thread (and its variables) is treated as such. Thread-local variables, denoted using a "self->":
makes it possible to prevent corruption of the variables of one thread by anotherclause-local
variables. To declare a variable as clause-local
, specify it as this->
, such as this->bar
. As suggested by its name, the scope of a clause-local variable is limited to the probe clause or predicate, in which it is used. For more information on thread-local and clause-local variables, please refer here"$target":
which is used in scripts using the PID provider
.To use the PID
provider, you need to have a mysqld
instance running on a (Open)Solaris system. (You won't need any special build of MySQL for this). Please note that, the function names are garbled in a binary. Hence any command, for eg. mysql_parse
will not be exactly the same, but will have other text at the starting and the ending. We can use "nm
" to see the garbled names:
nm mysqld | grep mysql_parse "2":1134 | 136472368| 640|FUNC |GLOB |0 |13 |__1cLmysql_parse6FpnDTHD_pkcIp3_v_
Hence, we shall simply use the regex '*' at the beginning and end of the function name in our D scripts.
#!/usr/sbin/dtrace -q pid$target::*mysql_parse*:entry /* This probe is fired when the execution enters mysql_parse */ { printf("Query: %s\n", copyinstr(arg1)); }
Save this script to a file, say watch.d
. A D script is specified to dtrace
with the -s
switch. The Process ID (PID) specified via the switch -p
is automatically made available to the $target
macro in the D script.
Now, run the D script, watch.d
:
$ pfexec dtrace -s watch.d -p `pgrep -x mysqld`
Fire up a MySQL client and run some queries. The D script should display the queries that you executed from the client:
Query: show databases Query: show variables Query: show engines Query: SELECT DATABASE() Query: show databases Query: show tables Query: show tables Query: select * from foo
A SQL query before execution passes through various other stages, the first of which is the query parsing. The query parsing is taken care of by the mysql_parse
function in sql/sql_parse.cc
. Since all the other stages, such as query optimisation and finally execution follows from there, using the following script, we can track all the functions that calls and returns from after mysql_parse
:
#!/usr/sbin/dtrace #pragma D option flowindent pid$target::*mysql_parse*:entry { self->start = vtimestamp; } pid$target:::entry /self->start/ { trace(timestamp); } pid$target:::return /self->start/ { trace(timestamp); } pid$target::*mysql_parse*:return /self->start/ { self->start = 0; }
You will observe an output similar to:
CPU FUNCTION 0 -> __1cLmysql_parse6FpnDTHD_pkcIp3_v_ 1791629230 0 -> __1cJlex_start6FpnDTHD__v_ 1791654796 0 -> __1cSst_select_lex_unitKinit_query6M_v_ 1791682536 0 -> __1cSst_select_lex_nodeKinit_query6M_v_ 1791710909 0 <- __1cSst_select_lex_nodeKinit_query6M_v_ 1791731630 0 <- __1cSst_select_lex_unitKinit_query6M_v_ 1791751932 0 -> __1cSst_select_lex_nodeLinit_select6M_v_ 1791776112 0 <- __1cSst_select_lex_nodeLinit_select6M_v_ 1791796307 0 -> __1cNst_select_lexKinit_query6M_v_ 1791821839 0 -> __1cSst_select_lex_nodeKinit_query6M_v_ 1791850872 0 <- __1cSst_select_lex_nodeKinit_query6M_v_ 1791871148 0 -> __1cJsql_alloc6FI_pv_ 1791900080 0 -> pthread_getspecific 1791921125 . . . 0 <- __1cEItemHcleanup6M_v_ 27789538249088 0 <- __1cKItem_identHcleanup6M_v_ 27789538281270 0 <- __1cKItem_fieldHcleanup6M_v_ 27789538313437 0 -> __SLIP.DELETER__Q 27789538347654 0 <- __SLIP.DELETER__Q 27789538382588 0 <- __1cLQdDuery_arenaKfree_items6M_v_ 27789538415179 0 <- __1cDTHDTcleanup_after_query6M_v_ 27789538450969 0 -> __1cKYacc_state2T5B6M_v_ 27789538486642 0 <- __1cKYacc_state2T5B6M_v_ 27789538521089 0 -> __1cQLex_input_stream2T5B6M_v_ 27789538556597 0 <- __1cQLex_input_stream2T5B6M_v_ 27789538600602 0 <- __1cLmysql_parse6FpnDTHD_pkcIp3_v_ 27789538637701
For digging into MySQL internals, as above, please refer this guide. You are also advised to refer to Understanding MySQL Internals
So, we have watched queries go by, how about capturing them into a file so as to use it for our own file logging purposes? We shall use a DTrace destructive function freopen
which redirects all that is written to a standard output into the specified file. (Discussion on the why it is called a destructive function). We are going to snoop on the "dispatch_command
": (in sql_parse.cc) function:
#!/usr/sbin/dtrace -qw #pragma D option strsize=1024 dtrace:::BEGIN { freopen("/tmp/sqls"); } pid$target::*dispatch_command*:entry
{ printf("%Y-> %s \n",walltimestamp,copyinstr(arg2)); }
which gives logs like:
2009 Feb 5 08:13:43-> create table fo_bawr (i INTEGER) 2009 Feb 5 08:13:56-> create table foo_bar (is INTEGER)
The PID provider
helps us get upto speed really fast when we are learning to DTrace any user level application. It also doesn't need a specially built application binary. However, we need to know the source code of the application really well. A basic level of that knowledge will enable us to write D scripts which are only as good. DTrace static probes in an application partially reduces the need to know the code end-to-end to write useful probes. The reason being, the embedded probes can be easily the highest level of abstraction to the important functionalities that are useful and likely to be monitored for performance considerations. As noted earlier, as long as the static probes are not used, no performance hit is experienced.
Static probes are being gradually integrated into MySQL. As of MySQL 6.0.9, there are around 55 static probes. The probes are defined and documented in the file: sql/probes.d
which is a good place to have a look at the currently available probes and understand how to use the probes in your D scripts. The currently available probes are also well described in the MySQL reference manual here.
To enable the static probes, you will have to supply an extra option to the configure script, --enable-dtrace
. After the build is over, start mysqld
. Now open a terminal, and type $pfexec dtrace -l | grep mysql
. You should see something like this:
135 mysql23509 mysqld __1cQdispatch_command6FnTenum_server_command_pnDTHD_pcI_b_ command-done 136 mysql23509 mysqld __1cQdispatch_command6FnTenum_server_command_pnDTHD_pcI_b_ command-start 137 mysql23509 mysqld __1cQclose_connection6FpnDTHD_Ib_v_ connection-done 138 mysql23509 mysqld handle_one_connection connection-start 139 mysql23509 mysqld __1cMmysql_delete6FpnDTHD_pnKTABLE_LIST_pnEItem_pnLst_sql_list_LXb_b_ delete-done 140 mysql23509 mysqld __1cHhandlerNha_delete_row6MpkC_i_ delete-row-done . .
As expected, the function names are mangled. You can view the demangled names with 'c++filt':
pfexec dtrace -l -m:mysql* | c++filt
which you will show the demangled functions:
ID PROVIDER MODULE FUNCTION NAME 56209 mysql706 mysqld bool dispatch_command(enum_server_command,THD*,char*,unsigned) command-done 56210 mysql706 mysqld bool dispatch_command(enum_server_command,THD*,char*,unsigned) command-start 56211 mysql706 mysqld void close_connection(THD*,unsigned,bool) connection-done 56212 mysql706 mysqld handle_one_connection connection-start 56213 mysql706 mysqld bool mysql_delete(THD*,TABLE_LIST*,Item*,st_sql_list*,unsigned long,unsigned long long,bool) delete-done 56214 mysql706 mysqld int handler::ha_delete_row(const unsigned char*) delete-row-done 56215 mysql706 mysqld int handler::ha_delete_row(const unsigned char*) delete-row-start 56216 mysql706 mysqld int mysql_execute_command(THD*) delete-start 56217 mysql706 mysqld unsigned long filesort(THD*,TABLE*,st_sort_field*,unsigned,SQL_SELECT*,unsigned long,bool,unsigned long*) filesort-done 56218 mysql706 mysqld unsigned long filesort(THD*,TABLE*,st_sort_field*,unsigned,SQL_SELECT*,unsigned long,bool,unsigned long*) filesort-start 56219 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-rdlock-done 56220 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-rdlock-start 56221 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-unlock-done 56222 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-unlock-start 56223 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-wrlock-done 56224 mysql706 mysqld int handler::ha_external_lock(THD*,int) handler-wrlock-start 56225 mysql706 mysqld bool mysql_insert(THD*,TABLE_LIST*,List- &,List
>&,List
- &,List
- &,enum_duplicates,bool) insert-done 56226 mysql706 mysqld int handler::ha_write_row(unsigned char*) insert-row-done 56227 mysql706 mysqld int handler::ha_write_row(unsigned char*) insert-row-start 56228 mysql706 mysqld void select_insert::abort() insert-select-done 56229 mysql706 mysqld bool select_insert::send_eof() insert-select-done 56230 mysql706 mysqld int mysql_execute_command(THD*) insert-select-start 56231 mysql706 mysqld int mysql_execute_command(THD*) insert-start 56232 mysql706 mysqld bool multi_delete::send_data(List
- &) multi-delete-done 56233 mysql706 mysqld int mysql_execute_command(THD*) multi-delete-start 56234 mysql706 mysqld bool multi_update::send_data(List
- &) multi-update-done . . .
A better Query Logger
Using one of the static probes, we shall now write a better query logger, containing information such as username and connection id.
#!/usr/sbin/dtrace -qws #pragma D option strsize=1024 dtrace:::BEGIN { printf("Logging to file..Hit Ctrl-C to end.\n"); freopen("/tmp/mysql.log"); } mysql*:::query-start /* using the mysql provider */ { self->query = copyinstr(arg0); /* Get the query */ self->connid = arg1; /* Get the connection ID */ self->db = copyinstr(arg2); /* Get the DB name */ self->who = strjoin(copyinstr(arg3),strjoin("@",copyinstr(arg4))); /* Get the username */ printf("%Y\t %20s\t Connection ID: %d \t Database: %s \t Query: %s\n", walltimestamp, self->who ,self->connid, self->db, self->query); }
which stores logs into a file, /tmp/mysql.log
:
2009 Feb 23 15:42:38 root@localhost Connection ID: 3 Database: Query: select user() 2009 Feb 23 15:42:41 root@localhost Connection ID: 3 Database: Query: show engines 2009 Feb 23 15:42:51 root@localhost Connection ID: 3 Database: Query: create table fo_bawr(is INTEGER) 2009 Feb 23 15:42:55 root@localhost Connection ID: 3 Database: Query: SELECT DATABASE() 2009 Feb 23 15:42:55 root@localhost Connection ID: 3 Database: test Query: show databases 2009 Feb 23 15:42:55 root@localhost Connection ID: 3 Database: test Query: show tables 2009 Feb 23 15:42:56 root@localhost Connection ID: 3 Database: test Query: create table fo_bawr(is INTEGER) 2009 Feb 23 15:43:04 root@localhost Connection ID: 3 Database: test Query: create table fo_bawr(i INTEGER) 2009 Feb 23 15:43:08 root@localhost Connection ID: 3 Database: test Query: create table fo_bawr2(i INTEGER)
Counting the bytes per client connection
In the last example, we are going to use the following static probes to write a D script which will give use the number of bytes transferred on a per client connection basis:
connection__start(unsigned long conn_id, char *user, char *host);
: This probe is fired when a new client connects to the servercommand__done(int status)
: This probe is fired when the client disconnectsprobe net__read__done(int status, unsigned long bytes); probe net__write__start(unsigned long bytes);
The script is as follows:
#!/usr/sbin/dtrace #pragma D option quiet dtrace:::BEGIN { printf("Tracking the bytes.. Hit Ctrl-C to end.\n"); } mysql*:::connection-start { self->bytes_read = 0; self->bytes_write = 0; self->conn_id = arg0; self->who = strjoin(copyinstr(arg1),strjoin("@",copyinstr(arg2))); /* Get the username */ printf("Got a client connection at %Y from %20s with ID %u\n", walltimestamp, self->who, self->conn_id); self->client_connect_start = timestamp; } mysql*:::net-read-done/* using the mysql provider */ { self->bytes_read = self->bytes_read + arg1; } mysql*:::net-write-start/* using the mysql provider */ { self->start_w = timestamp; self->bytes_write= self->bytes_write + arg1; } mysql*:::connection-done { printf ("Connection with ID: %u closed.\nTotal Bytes transferred: %u \nTotal connection time (ms): %-9d\n\n",self>conn_id, self->bytes_read + self->bytes_write,(timestamp-self->client_connect_start)/1000000 ); }
The script above, reports the data transfer activity as follows:
Tracking the bytes.. Hit Ctrl-C to end. Got a client connection at 2009 Feb 23 20:07:24 from root@localhost with ID 50 Got a client connection at 2009 Feb 23 20:07:27 from amit@localhost with ID 51 Connection with ID: 50 closed. Total Bytes transferred: 1081204027 Total connection time (ms): 17650 Connection with ID: 51 closed. Total Bytes transferred: 3039908614 Total connection time (ms): 23787
The niche area of the MySQL community which DTrace can breathe life into are the Database Administrators (DBAs) who strive to keep the database in good health at all points. Its easy to identify the performance bottlenecks that might have crept into the server over a period of time. With intelligent probe descriptions, its easy to get relevant statistics of a running MySQL server.
Besides the DBAs, DTrace is a great tool to understand how the control flow occurs in the MySQL server, when it receives a client request, till it serves the request. This makes it very easy to understand all the different sub-components of the MySQL server architecture.
mysqld
using DTrace