The following list shows the most common states you see in the
State
column for a slave server I/O thread.
This state also appears in the
Slave_IO_State
column displayed by
SHOW SLAVE STATUS
, so you can
get a good view of what is happening by using that statement.
The initial state before Connecting to
master
.
The thread is attempting to connect to the master.
A state that occurs very briefly, after the connection to the master is established.
A state that occurs very briefly after the connection to the master is established.
A state that occurs very briefly, after the connection to the master is established. The thread sends to the master a request for the contents of its binary logs, starting from the requested binary log file name and position.
Waiting to reconnect after a failed binlog dump
request
If the binary log dump request failed (due to
disconnection), the thread goes into this state while it
sleeps, then tries to reconnect periodically. The interval
between retries can be specified using the
CHANGE MASTER TO
statement.
Reconnecting after a failed binlog dump
request
The thread is trying to reconnect to the master.
Waiting for master to send event
The thread has connected to the master and is waiting for
binary log events to arrive. This can last for a long time
if the master is idle. If the wait lasts for
slave_net_timeout
seconds, a timeout occurs. At that point, the thread
considers the connection to be broken and makes an attempt
to reconnect.
Queueing master event to the relay log
The thread has read an event and is copying it to the relay log so that the SQL thread can process it.
Waiting to reconnect after a failed master event
read
An error occurred while reading (due to disconnection).
The thread is sleeping for the number of seconds set by
the CHANGE MASTER TO
statement (default 60) before attempting to reconnect.
Reconnecting after a failed master event
read
The thread is trying to reconnect to the master. When
connection is established again, the state becomes
Waiting for master to send event
.
Waiting for the slave SQL thread to free enough
relay log space
You are using a nonzero
relay_log_space_limit
value, and the relay logs have grown large enough that
their combined size exceeds this value. The I/O thread is
waiting until the SQL thread frees enough space by
processing relay log contents so that it can delete some
relay log files.
Waiting for slave mutex on exit
A state that occurs briefly as the thread is stopping.
User Comments
When you observe the "Reconnecting after a failed master event" state, double check all slaves' server-ids in their config files. You will get those messages randomly if the ids are not unique.
A MySQL slave stopped receiving binlog events from the master server.
Slave_IO_State was, "Waiting for master to send event," Slave_IO_Running and Slave_SQL_Running were both yes, and Seconds_Behind_Master was 0.
These four values indicated that all was well; however, Read_Master_Log_Pos was not changing. Every few seconds the Slave_IO_State would momentarily change to, "Reconnecting after a failed master event read", and then change back to, "Waiting for master to send event."
I checked the query at the frozen Read_Master_Log_Pos on the master's binlog file. The query was very, very large (2+ megs).
I increased the global max_allowed_packet variable on the slave to be appropriately large, because it was too small to allow the query event through.
Replication immediately stopped with Last_Error as, "Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave."
I executed the large query manually.
Then, I set the global sql_slave_skip_counter=1;
After I then ran, "start slave;", everything was well once again.
Related to Geoff's comment:
I encountered this same problem, and was able to resolve it by simply increasing the max packet size - make sure you try this before moving the log offset.
An additional symptom in our case was that resource utilization on the master server went up considerably, and the master state was always network related. Stopping the slave server fixed the master's performance issues.
After increasing the packet size, the slave caught up and the master thread state settled at "Has sent all binlog to slave..."
Add your own comment.