About the Author

Robin Schumacher is MySQL's Director of Product Management and has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines. Before joining MySQL, Robin was Vice President of Product Management at Embarcadero Technologies.

Accessing Distributed Data with the Federated Storage Engine

By Robin Schumacher

Accessing Distributed Data with the Federated Storage Engine

It's no secret that corporations are swimming in more data than ever before. IDC has recently pegged data growth at 48% a year, which means that corporate data nearly doubles every two years at many companies. In addition, purchased applications and custom built systems continue to bring in new databases that require attention, and business intelligence remains a red-hot area for modern enterprises with many companies building analytic data warehouses or data marts that continually feed information to key decision makers.

A serious challenge for today's database professional is linking together the many "data islands" that exist in a data center as many databases tend to be siloed on separate servers, which makes it hard to correlate data that is related but marooned on many different physical server islands. This can especially be true of MySQL servers as its tremendous ease-of-use can actually cause data island problems before a DBA knows it. This problem can surface if they have lots of MySQL servers running that have separate but related data that occasionally must be used together as one logical database.

Federated to the Rescue

Coming to the rescue of such a situation is the Federated storage engine, which was introduced in MySQL 5.0. The Federated engine allows a DBA to create logical pointers to tables that exist on other MySQL servers and thereby link together separate data islands to form one or more logical databases. Those coming from Oracle may be familiar with database links on that platform, and those moving to MySQL from SQL Server will recognize the concept of linked servers on Microsoft-based systems.

The Federated storage engine of MySQL is extremely easy to use and set up, and can quickly turn into a DBA's best friend if they have to answer customer demands to correlate data that exists on several different physical servers. Let's take a quick walk through of how and why to use the Federated storage engine to see the benefits it provides.

Getting Started with Federated

The Federated engine is bundled into the MySQL server from version 5.0 on up. You can validate that your installation has Federated by issuing a simple SHOW ENGINES command from the mysql client program.

As has previously been mentioned, a Federated table acts as a pointer to an actual table object that exists on the same or another server. Once this link/pointer has been established, you can perform whatever operations you would like on the remote object (inserts, updates, deletes, reads, etc.), as long as you have been given the privileges to do so. Understand that your capabilities on the remote object are restricted to the underlying engine that is serving as the source of the Federated table. For example, if you create a Federated table that is pointing to a MyISAM table on another server, you do not have transaction (commit/rollback) capability. Likewise, a Federated pointer to an Archive engine table would not allow you to update or selectively delete data as Archive tables allow reads and inserts only.

While there are many good use cases available for Federated, let's take a look at one that many companies are facing today, which is maintaining online access to seldom-referenced historical data. Government compliance regulations or internal standards may dictate that a company has to keep "n" years worth of data online in case it's needed for auditing or other purposes. The problem is that retaining large amounts of historical data online means that response times can be negatively impacted if historical and current information is blended together in one database. Because of this, DBAs look to use either partitioning or data archiving to accomplish a split between active and inactive data. Data archiving, where data is migrated to different databases that may be located on a separate physical server (that has cheaper disks and is less expensive), is particularly appealing as keeping all historical data out of a currently-used database minimizes maintenance function times (e.g. backups, storage maintenance, etc.) as well as query response times.

Let's say we have data that has been archived onto a separate Linux server that contains historical records concerning client transactions for an investment company. Located on a Windows-based MySQL server is a table that contains current transactions for client trades that have occurred in the current year. Users are now asking that the two tables be referenced as one for reporting purposes.

The first step is to create a Federated object on the Windows server that points to the history table located on the Linux machine. This is easily done by logging onto the Windows MySQL server and issuing a command like the following:

mysql> use gim2
Database changed
mysql> CREATE TABLE client_transaction_history (
    ->   client_transaction_id int(11) NOT NULL default '0',
    ->   client_id int(11) NOT NULL default '0',
    ->   investment_id int(11) NOT NULL default '0',
    ->   action varchar(10) NOT NULL default '',
    ->   price decimal(12,2) NOT NULL default '0.00',
    ->   number_of_units int(11) NOT NULL default '0',
    ->   transaction_status varchar(10) NOT NULL default '',
    ->   transaction_sub_timestamp datetime NOT NULL default '0000-00-00 00:00:00',
    ->   transaction_comp_timestamp datetime NOT NULL default '0000-00-00 00:00:00',
    ->   description varchar(200) default NULL,
    ->   broker_id bigint(10) default NULL,
    ->   broker_commission decimal(10,2) default NULL
    -> )
    -> ENGINE=FEDERATED
    -> DEFAULT CHARSET=latin1
    -> CONNECTION='mysql://robin:mypassword@192.168.32.69:3306/gim2/client_transaction_history';
Query OK, 0 rows affected (0.08 sec)

The first thing to note is that a Federated table must be defined in a way so that it exactly mirrors the source table. This is actually not hard to do as all that is needed is to cut the output definition of a SHOW CREATE TABLE statement from the source database into an editor and make a few DDL changes that are needed for a Federated table.

Next, the ENGINE= clause is set to Federated. Finally, the CONNECTION clause is where all the magic happens. It contains the technical directional information needed by the host MySQL server to talk to the remote system. The format of the CONNECTION clause is as follows:

scheme://user_name[:password]@host_name[:port_num]/db_name/tbl_name

In the example above, a user named "robin" is connecting with a password of "mypassword" to a server identified by an IP address and port number 3306. The remote database is named "gim2" and the remote table name is "client_transaction_history". For you security-conscious DBAs, note that the password used in any Federated table definition is indeed visible to anyone that can view the definition of the table via a SHOW CREATE TABLE command.

Once created, the Federated table may be accessed as any other table:

mysql> select count(*) from client_transaction_history where broker_id > 4;
+----------+
| count(*) |
+----------+
|   129880 |
+----------+
1 row in set (2.14 sec)

To continue on with our data archiving example, referencing the current client transaction data and historical information from the Windows MySQL server is now easily done by creating a union view between the local table and the Federated object:

mysql> CREATE VIEW
    ->   client_transaction_all
    -> AS
    -> SELECT
    ->   client_transaction_id, 
    ->   client_id,
    ->   investment_id,
    ->   action,
    ->   price,
    ->   number_of_units,
    ->   transaction_status,
    ->   transaction_sub_timestamp,
    ->   transaction_comp_timestamp,
    ->   description,
    ->   broker_id,
    ->   broker_commission
    -> FROM
    ->   client_transaction
    -> UNION ALL
    -> SELECT
    ->   client_transaction_id,
    ->   client_id,
    ->   investment_id,
    ->   action,
    ->   price,
    ->   number_of_units,
    ->   transaction_status,
    ->   transaction_sub_timestamp,
    ->   transaction_comp_timestamp,
    ->   description,
    ->   broker_id,
    ->   broker_commission
    -> FROM
    ->   client_transaction_history;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(*) from client_transaction_all;
+----------+
| count(*) |
+----------+
|   205425 |
+----------+
1 row in set (2.42 sec)

Figure 1 - Using Federated to create a logical database

Now users have an easy way of referencing local/current and remote/historical data in one nice wrapper.

Other Federated Table Possibilities

Beyond using the new 5.0 Federated engine to handle the standard use cases, some MySQL users have gotten creative and are using the new 5.0 engine to meet other needs. One of these needs is synchronous replication. MySQL currently performs asynchronous replication where data, when either inputted or changed in the database, is not replicated immediately to slave servers, but instead is sent a short time afterwards. While such a scheme is just fine for most situations, some applications require near-instant replication of data from a source to target servers.

Some users have used the Federated engine in conjunction with triggers to set up such a configuration. The basic idea is to:

Create a federated table on the Master/Source server that points to a remote table that is on the target/replicated system.
Create triggers on the Master table that funnel all DML changes to the remote table once they have been applied on the Master table.

For example, let's say we have a very simple table that contains incoming customer calls that looks like this:

+---------------+----------+------+-----+---------+-------+
| Field         | Type     | Null | Key | Default | Extra |
+---------------+----------+------+-----+---------+-------+
| customer_id   | int(11)  | YES  |     | NULL    |       |
| broker_id     | int(11)  | YES  |     | NULL    |       |
| call_datetime | datetime | YES  |     | NULL    |       |
+---------------+----------+------+-----+---------+-------+

We would like to ensure that any call that comes in is immediately reflected on a remote system used for query purposes. The first thing to do is create a mirror table on the remote/target system and then create a federated object that points to it:

mysql> create table customer_calls_rep (customer_id int, broker_id int, 
    -> call_datetime datetime)
    -> engine=federated
    -> CONNECTION='mysql://robin:mypassword@192.168.32.69:3306/gim2/customer_calls';

Now we create triggers on the source table to immediately replicate any DML changes to the remote table (only an INSERT trigger is shown here; UPDATE and DELETE would also be needed):

mysql> delimiter //
mysql> create trigger rep_insert
    -> after insert on customer_calls
    -> for each row
    -> begin
    ->  insert into customer_calls_rep values
    ->  (NEW.customer_id,NEW.broker_id, NEW.call_datetime);
    -> end;
    -> //

Now when any data is moved through the Master table, the change is immediately reflected on the remote system:

mysql> insert into customer_calls values (1,1,now());

Query OK, 1 row affected (0.03 sec)

mysql> select * from customer_calls;

+-------------+-----------+---------------------+
| customer_id | broker_id | call_datetime       |
+-------------+-----------+---------------------+
|           1 |         1 | 2006-01-24 11:41:06 |
+-------------+-----------+---------------------+
1 row in set (0.00 sec)

mysql> select * from customer_calls_rep;
+-------------+-----------+---------------------+
| customer_id | broker_id | call_datetime       |
+-------------+-----------+---------------------+
|           1 |         1 | 2006-01-24 11:41:06 |
+-------------+-----------+---------------------+
1 row in set (0.00 sec)

Of course, the above is not meant to be a replacement for cases where replication to other systems needs to be guaranteed as there is no transaction support for the federated engine in MySQL 5.0, and there isn't the safety net (as with the MySQL binary log used for replication) of having a log of changes be available should something fail in the data transmission to the remote server. Using federated in this situation definitely qualifies under the "quick and dirty" category only.

Figure 2 - Quick synchronous replication setup using Federated

Wrap Up

There are some current restrictions to the Federated engine, including:

Only other MySQL servers may be sources of the Federated engine at this time.
Transactions are not supported (should be in version 5.1).
The structure definitions of both the source and federated object must stay identical.
DDL operations are not supported through the Federated engine (ALTER TABLE, etc.).
Query cache support is not enabled.

Of course, MySQL engineers are busy at work to remove these restrictions (in particular, the first and second points). For more information on the Federated engine, you can check out the online reference manual at http://dev.mysql.com/doc/refman/5.0/en/federated-storage-engine.html and visit the MySQL forums as there is a forum section devoted to federated, which can be referenced at http://forums.mysql.com/list.php?105.

Without a doubt, DBAs have a challenge on their hands in linking together the many separate data islands that exist in their data center. Fortunately, the MySQL 5.0 Federated storage engine helps to build those necessary bridges between database servers to create a logical database that is easily referenced by all.

About the Author

Related Pages:

Accessing Distributed Data with the Federated Storage Engine

Accessing Distributed Data with the Federated Storage Engine

Federated to the Rescue

Getting Started with Federated

Other Federated Table Possibilities

Wrap Up