Robin Schumacher is MySQL's Director of Product Management and has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines. Before joining MySQL, Robin was Vice President of Product Management at Embarcadero Technologies.
The IT crown jewels of any organization lies in their database infrastructure. Obviously, well-designed front end applications play a leading role when it comes to ensuring that critical information is captured or reviewed, but the buck stops at the database when it comes to data retention and protection.
Because such things are the database's responsibility, it is important that the underlying engine act in ways that preserve the integrity and validity of the data. This means the RDBMS should stand guard and deny entry of any and all invalid data through the welcome gates of the actual database.
For users of MySQL, this issue of server-enforced data integrity has been somewhat of a blessing and a curse. Traditionally, MySQL has taken the position that the front-end application should bear the responsibility of validating incoming data. There are countless users of MySQL who are completely fine with this approach, and many who actually prefer this form of data validation.
However, the meteoric rise in MySQL's popularity has brought many database "switch out" projects where MySQL is replacing proprietary or other open source database back ends where data integrity enforcement is expected at the database engine layer and not at the application layer. Most hard-core database professionals expect a database to validate incoming data and reject any data that is of the wrong datatype or data that violates certain data integrity rules.
The good news is that MySQL 5.0 now offers server-enforced data integrity. While the database server can still be used in its former mode of operation, database professionals wanting the RDBMS to be the ultimate authority with respect to accepting or rejecting data into/from the database can now easily make this happen in 5.0.
Prior to version 5.0, MySQL handled incoming invalid data in a manner that most non-MySQL users wouldn't expect. The best way to demonstrate this is through the following example. Let's create a table with several different datatypes and then insert some bad data into the table to see how MySQL handles it.
First, let's create our table and insert some clean data:
mysql> create table mytest (c1 int, c2 varchar(10), c3 date) engine=myisam; Query OK, 0 rows affected (0.00 sec) mysql> insert into mytest values (1,'testing',now()); Query OK, 1 row affected (0.00 sec) mysql> select * from mytest; +------+---------+------------+ | c1 | c2 | c3 | +------+---------+------------+ | 1 | testing | 2005-11-29 | +------+---------+------------+ 1 row in set (0.00 sec)
So far so good. But now let's use MySQL's multi-insert capability (a great feature by the way…!) to run some invalid data through each column in our table and see what happens:
mysql> insert into mytest values -> ('bad number','testing',now()), -> (1,1,now()), -> (1,'testing','bad date'); Query OK, 3 rows affected, 2 warnings (0.00 sec) Records: 3 Duplicates: 0 Warnings: 2 mysql> show warnings; +---------+------+------------------------------------------------------+ | Level | Code | Message | +---------+------+------------------------------------------------------+ | Warning | 1264 | Out of range value adjusted for column 'c1' at row 1 | | Warning | 1265 | Data truncated for column 'c3' at row 3 | +---------+------+------------------------------------------------------+ 2 rows in set (0.00 sec) mysql> select * from mytest; +------+---------+------------+ | c1 | c2 | c3 | +------+---------+------------+ | 1 | testing | 2005-11-29 | | 0 | testing | 2005-11-29 | | 1 | 1 | 2005-11-29 | | 1 | testing | 0000-00-00 | +------+---------+------------+ 4 rows in set (0.00 sec)
The first set of values contains a string value for the first column, which is a numeric datatype. The second set of values contains a numeric value for the second column, which is a character datatype (variable). And the third set of values contains a string value for the third column, which is a date datatype.
What does MySQL do? It 'silently' converts the incoming bad data into the default values for each of the table's column datatypes. So in the above example, the string data targeting column one in our table gets converted to a zero (the default value for numeric columns), the numeric data targeting the varchar column goes in as is, and the bad date gets converted to be a zero-filled date value.
This behavior has been criticized by some and termed a MySQL 'gotcha'. Indeed, for those coming from traditional database management systems, this style of data handling is out of character. However, there are some who actually prefer this type of behavior. For example, some MySQL users don't want large data loads interrupted with errors and would rather have the load finish and then perform data validation. This type of approach is somewhat mirrored in Oracle10g release 2, where, during a load, bad data can be redirected into a mirror table that contains all rows that have violated data integrity rules.
But, as has already been stated, the vast majority of the database community prefers to have traditional server-enforced data integrity. And that's exactly what they can have with MySQL 5.0.
In version 5.0, MySQL users have the option to retain the old behavior of data integrity validation or switch to the new mode of server-enforced data integrity. At the heart of this change is the sql_mode configuration parameter. A DBA can modify this parameter in a variety of ways to produce exactly the type of data integrity enforcement they want. Let's see how easy this change can occur. We'll use the most restrictive setting for the sql_mode parameter and repeat our prior tests to see how differently MySQL behaves:
mysql> show variables like 'sql_mode'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | sql_mode | | +---------------+-------+ mysql> set sql_mode=strict_all_tables; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'sql_mode'; +---------------+-------------------+ | Variable_name | Value | +---------------+-------------------+ | sql_mode | STRICT_ALL_TABLES | +---------------+-------------------+ mysql> insert into mytest values ('testing','testing',now()); ERROR 1264 (22003): Out of range value adjusted for column 'c1' at row 1 mysql> insert into mytest values (1,'testing','bad date'); ERROR 1292 (22007): Incorrect date value: 'testing' for column 'c3' at row 1
First, notice that the server-enforced data integrity capability is dynamic - it can be changed without stopping/starting the MySQL server. Next, take note that the above example changed server-enforced data integrity at the client level only:
mysql> show global variables like 'sql_mode'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | sql_mode | | +---------------+-------+ 1 row in set (0.00 sec) mysql> show variables like 'sql_mode'; +---------------+-------------------+ | Variable_name | Value | +---------------+-------------------+ | sql_mode | STRICT_ALL_TABLES | +---------------+-------------------+ 1 row in set (0.00 sec)
A DBA can globally set such data integrity enforcement by changing the sql_mode configuration parameter in the my.cnf file or they can start the MySQL daemon/service and specifically specify it via command line startup if they wish. A client (given the privileges) can also override the global setting if they desire.
However, the main thing to observe in the above test is that MySQL is now rejecting invalid data and is not converting it into a column's default value. Note that this data integrity enforcement includes all aspects of MySQL including things like ENUM constraints:
mysql> create table emp(name varchar(30), gender enum('M','F')) engine=myisam; Query OK, 0 rows affected (0.00 sec) mysql> set sql_mode=''; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'sql_mode'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | sql_mode | | +---------------+-------+ 1 row in set (0.00 sec) mysql> insert into emp values ('fred','A'); Query OK, 1 row affected, 1 warning (0.00 sec) mysql> show warnings; +---------+------+---------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------+ | Warning | 1265 | Data truncated for column 'gender' at row 1 | +---------+------+---------------------------------------------+ 1 row in set (0.00 sec) mysql> select * from emp; +------+--------+ | name | gender | +------+--------+ | fred | | +------+--------+ 1 row in set (0.00 sec) mysql> set sql_mode='strict_all_tables'; Query OK, 0 rows affected (0.00 sec) mysql> insert into emp values ('john','A'); ERROR 1265 (01000): Data truncated for column 'gender' at row 1 mysql> select * from emp; +------+--------+ | name | gender | +------+--------+ | fred | | +------+--------+ 1 row in set (0.00 sec)
The setting used for sql_mode above, strict_all_tables, enables server-enforced data integrity across all of MySQL's pluggable storage engines. However, you have much more flexibility in setting this parameter. There are wide arrays of options to choose from and you can choose to set more than one value for the parameter if you wish. For example, you can choose to only have server-enforced data integrity for all your transaction tables, but not allow zero dates in any table regardless of underlying storage engine. For a complete listing of all the available parameters, please see http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html in the MySQL Reference Manual.
The last thing to note is that the global sql_mode configuration parameter is intentionally left blank for version 5.0 of MySQL. Countless MySQL applications are written with MySQL's former data-enforcement behavior in mind, so no global change has been introduced for MySQL's defaults at this time. This may change in a future release of the server.
Because databases should contain only accurate and valid data, DBAs must work hard to ensure that no misleading information ever enters the gates of their operational data stores and data warehouses. Beginning in MySQL 5.0, DBAs can now enforce data integrity through the MySQL server and not have to depend on data validation via application front ends.
Making the change from no integrity enforcement to global integrity enforcement is easy to do, so MySQL DBAs upgrading from previous MySQL versions should seriously consider its use. And anyone new to MySQL who is deploying version 5.0 should definitely make the new server-enforced data integrity the default.