MySQL :: MySQL 3.23, 4.0, 4.1 Reference Manual :: 9.1.7.6 Collations Must Be for the Right Character Set

MySQL 3.23, 4.0, 4.1 Reference Manual :: 9 Internationalization and Localization :: 9.1 Character Set Support :: 9.1.7 Collation Issues :: 9.1.7.6 Collations Must Be for the Right Character Set

« 9.1.7.5 Special Cases Where Collation Determination Is Tricky

9.1.7.7 Examples of the Effect of Collation »

Section Navigation [Toggle]

9.1.7 Collation Issues
9.1.7.1 Using COLLATE in SQL Statements
9.1.7.2 COLLATE Clause Precedence
9.1.7.3 The BINARY Operator
9.1.7.4 The _bin and binary Collations
9.1.7.5 Special Cases Where Collation Determination Is Tricky
9.1.7.6 Collations Must Be for the Right Character Set
9.1.7.7 Examples of the Effect of Collation

9.1.7.6. Collations Must Be for the Right Character Set

Each character set has one or more collations, but each collation is associated with one and only one character set. Therefore, the following statement causes an error message because the latin2_bin collation is not legal with the latin1 character set:

mysql> SELECT _latin1 'x' COLLATE latin2_bin;
ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid
for CHARACTER SET 'latin1'

In some cases, expressions that worked before MySQL 4.1 fail in early versions of MySQL 4.1 if you do not take character set and collation into account. For example, before 4.1, this statement works as is:

mysql> SELECT SUBSTRING_INDEX(USER(),'@',1);
+-------------------------------+
| SUBSTRING_INDEX(USER(),'@',1) |
+-------------------------------+
| root                          |
+-------------------------------+

The statement also works as is in MySQL 4.1 as of 4.1.8: In MySQL 4.1, user names are stored using the utf8 character set (see Section 9.1.10, “UTF-8 for Metadata”). The literal string '@' has the server character set (latin1 by default). Although the character sets are different, MySQL can coerce the latin1 string to the character set (and collation) of USER() without data loss. It does so, performs the substring operation, and returns a result that has a character set of utf8.

However, in versions of MySQL 4.1 before 4.1.8, the statement fails:

mysql> SELECT SUBSTRING_INDEX(USER(),'@',1);
ERROR 1267 (HY000): Illegal mix of collations
(utf8_general_ci,IMPLICIT) and (latin1_swedish_ci,COERCIBLE)
for operation 'substr_index'

This happens because the automatic character set conversion of '@' does not occur and the string operands have different character sets (and thus different collations):

mysql> SELECT COLLATION(USER()), COLLATION('@');
+-------------------+-------------------+
| COLLATION(USER()) | COLLATION('@')    |
+-------------------+-------------------+
| utf8_general_ci   | latin1_swedish_ci |
+-------------------+-------------------+

One way to deal with this is to upgrade to MySQL 4.1.8 or later. If that is not possible, you can tell MySQL to interpret the literal string as utf8:

mysql> SELECT SUBSTRING_INDEX(USER(),_utf8'@',1);
+------------------------------------+
| SUBSTRING_INDEX(USER(),_utf8'@',1) |
+------------------------------------+
| root                               |
+------------------------------------+

Another way is to change the connection character set and collation to utf8. You can do that with SET NAMES 'utf8' or by setting the character_set_connection and collation_connection system variables directly.

Previous / Next / Up / Table of Contents

User Comments

Add your own comment.

Top / Previous / Next / Up / Table of Contents