MySQL :: MySQL 3.23, 4.0, 4.1 Reference Manual :: 9.1.7.5 Special Cases Where Collation Determination Is Tricky

MySQL 3.23, 4.0, 4.1 Reference Manual :: 9 Internationalization and Localization :: 9.1 Character Set Support :: 9.1.7 Collation Issues :: 9.1.7.5 Special Cases Where Collation Determination Is Tricky

« 9.1.7.4 The _bin and binary Collations

9.1.7.6 Collations Must Be for the Right Character Set »

Section Navigation [Toggle]

9.1.7 Collation Issues
9.1.7.1 Using COLLATE in SQL Statements
9.1.7.2 COLLATE Clause Precedence
9.1.7.3 The BINARY Operator
9.1.7.4 The _bin and binary Collations
9.1.7.5 Special Cases Where Collation Determination Is Tricky
9.1.7.6 Collations Must Be for the Right Character Set
9.1.7.7 Examples of the Effect of Collation

9.1.7.5. Special Cases Where Collation Determination Is Tricky

In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column x:

SELECT x FROM T ORDER BY x;
SELECT x FROM T WHERE x = x;
SELECT DISTINCT x FROM T;

However, when multiple operands are involved, there can be ambiguity. For example:

SELECT x FROM T WHERE x = 'Y';

Should this query use the collation of the column x, or of the string literal 'Y'?

Standard SQL resolves such questions using what used to be called “coercibility” rules. Basically, this means: Both x and 'Y' have collations, so which collation takes precedence? This can be difficult to resolve, but the following rules cover most situations:

An explicit COLLATE clause has a coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
A column's collation has a coercibility of 2.
A “system constant” (the string returned by functions such as USER() or VERSION()) has a coercibility of 3.
A literal's collation has a coercibility of 4.
NULL or an expression that is derived from NULL has a coercibility of 5.

The preceding coercibility values are current as of MySQL 4.1.11. See the note later in this section for additional version-related information.

Those rules resolve ambiguities in the following manner:

Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
- If both sides are Unicode, or both sides are not Unicode, it is an error.
- If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement will not return an error:
```
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
```
  It will return a result, and the character set of the result will be utf8. The collation of the result will be the collation of utf8_column. Values of latin1_column will be automatically converted to utf8 before concatenating.
- For an operation with operands from the same character set but that mix a _bin collation and a _ci or _cs collation, the _bin collation is used. This is similar to how operations that mix nonbinary and binary strings evaluate the operands as binary strings, except that it is for collations rather than data types.

Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.

Examples:

`column1 = 'A'`	Use collation of `column1`
`column1 = 'A' COLLATE x`	Use collation of `'A'`
`column1 COLLATE x = 'A' COLLATE y`	Error

The COERCIBILITY() function can be used to determine the coercibility of a string expression:

mysql> SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);
        -> 0
mysql> SELECT COERCIBILITY(VERSION());
        -> 3
mysql> SELECT COERCIBILITY('A');
        -> 4

See Section 11.10.3, “Information Functions”.

Before MySQL 4.1.11, there is no system constant or ignorable coercibility. Functions such as USER() have a coercibility of 2 rather than 3, and literals have a coercibility of 3 rather than 4.

Previous / Next / Up / Table of Contents

User Comments

Add your own comment.

Top / Previous / Next / Up / Table of Contents