by Giuseppe Maxia
04/20/2006
You may know about the MySQL Cluster, which is a complex architecture to achieve high availability and performance. One of the advantages of MySQL Cluster is that each node is a peer to the others, whereas in a normal replicating system you have a master and many slaves, and applications must be careful to write only to the master.
The main disadvantages of MySQL Cluster are (as of MySQL 5.0):
READ COMMITTED
.There are some cases where the MySQL Cluster is the perfect solution, but for the vast majority, replication is still the best choice.
Replication, too, has its problems, though:
Fixing these two misfeatures is exactly the purpose of this article. Using features introduced in MySQL 5.0 and 5.1, it is possible to build a replication system where all nodes act as master and slave at the same time, with a built-in fail-over mechanism.
For those of you not well acquainted with the replication basics, I can refer to an earlier article explaining MySQL replication, and the demanding reader can integrate with the dry but extensive official MySQL replication manual.
Back to business. Consider the situation where you set up a replication system with more than one master. This has been a common scenario over the past several years. Chapters 7 and 8 of Jeremy Zawodny's High Performance MySQL describe such a solution. At the time of the book's publication, though, the necessary technology was not yet available.
One hard-to-solve problem in a multimaster replication is the conflict that can happen with self-generated keys. The AUTO_INCREMENT
feature is quite convenient, but in a replication environment it will be disruptive. If node A and node B both insert an auto-incrementing key on the same table, conflicts arise immediately.
Rescue comes with recent versions. MySQL 5 introduces a couple of server variables for replicated auto-increment that address this specific problem and allow for the creation of an array of peer-to-peer nodes with MySQL replication.
Quoting from the manual:
auto_increment_increment
controls the increment between successive AUTO_INCREMENT
values.auto_increment_offset
determines the starting point for AUTO_INCREMENT
column values.By choosing non-conflicting values for these variables on different masters, servers in a multiple-master configuration will not use conflicting AUTO_INCREMENT
values when inserting new rows into the same table. To set up N master servers, set the variables like this:
auto_increment_increment
to N on each master.auto_increment_offset
, using the values 1, 2, ... , N.Using those two variables as described in the manual, you can ensure that all nodes in your replication array will use different sequences of auto-incrementing numbers. For example, using auto_increment_increment = 10
and auto_increment_offset=3
, the numbers generated when inserting three records will be 3, 13, 23. Using 10, 7, you'll get 7, 17, 27, and so on.
For my four-node array, I set auto_increment_increment
to 10 for each node, and auto_increment_offset
to 1 in the first node, 2 in the second, and so on.
This is theoretically clear, but it still isn't clear how I managed to transform these servers into peer-to-peer nodes.
The answer is a circular replication, where each node is master of the following node and slave of the preceding one.
In its simplest form, circular replication has two nodes, where each one is at the same time master and slave of the other (Figure 1).
Figure 1. Circular replication between two nodes
For this test, I used two servers in my company (water and air; there will soon be two more, named fire and earth). Their basic configuration is:
# node A - (water) setup
[mysqld]
server-id = 10
# auto_increment_increment = 10
# auto_increment_offset = 1
master-host = air.stardata.it
master-user = nodeAuser
master-password = nodeApass
# node B - (air) setup
[mysqld]
server-id = 20
# auto_increment_increment = 10
# auto_increment_offset = 2
master-host = water.stardata.it
master-user = nodeBuser
master-password = nodeBpass
Notice the two magic variables in the configuration files. If you omit such variables (or comment them, as in this example), then something nasty may happen, and the unfortunate circumstances are easy to demonstrate. Remember that MySQL replication is asynchronous. It means that the replication process in the slave can happen at a different time than the one taking place in the master. This feature makes replication more resilient and ensures that even if you suffer a connection breakdown between master and slave, replication will continue when the slave connection resumes. However, this feature has a nasty side effect when you deal with auto-incremented values. Assume that you have a table like this:
CREATE TABLE x (
id int(11) NOT NULL AUTO_INCREMENT,
c char(10) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Assume also that the connection between node A and node B breaks for a moment. Suppose you issue an INSERT
statement in both servers, while the replication is not working and the auto_increment
variables are not set:
[node A] insert into x values (null, 'aaa'), (null, 'bbb'), (null, 'ccc');
[node B] insert into x values (null, 'xxx'), (null, 'yyy'), (null, 'zzz');
When replication resumes, you get a blocking error in both nodes:
Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database:
'test'. Query: 'insert into x values (null, 'aaa')'
The reason is easy to discover:
[node A] select * from x;
+----+------+
| id | c |
+----+------+
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |
+----+------+
[node B] select * from x;
+----+------+
| id | c |
+----+------+
| 1 | xxx |
| 2 | yyy |
| 3 | zzz |
+----+------+
Both nodes have produced the same primary keys. Thus, when replication resumed, the DBMS justly complained that there was a mistake. Now activate those two variables to see what happens.
[node A] set auto_increment_increment = 10;
[node A] set auto_increment_offset = 1;
[node B] set auto_increment_increment = 10;
[node B] set auto_increment_offset = 2;
Clean the errors, delete all the records in the test table, and redo the insertion (after stopping the replication, to simulate a communication breakdown):
[node A] SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; start slave;
[node B] SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; start slave;
[node A] truncate x;
[node A] stop slave ;
[node B] stop slave ;
[node A] insert into x values (null, 'aaa'), (null, 'bbb'), (null, 'ccc');
[node B] insert into x values (null, 'xxx'), (null, 'yyy'), (null, 'zzz');
Now the situation is different.
[node A] select * from x;
+-----+------+
| id | c |
+-----+------+
| 1 | aaa |
| 11 | bbb |
| 21 | ccc |
+-----+------+
[node B] select * from x;
+-----+------+
| id | c |
+-----+------+
| 2 | xxx |
| 12 | yyy |
| 22 | zzz |
+-----+------+
Thus, when replication resumes, there is no conflicting error. This proves it. Choosing appropriate values for the auto_increment_increment
and auto_increment_offset
server variables prevents conflicts between auto-generated keys in this circular replication setup. QED.
To read the rest of this article, please visit ONLamp.com »