In the great majority of statements, it is obvious what
collation MySQL uses to resolve a comparison operation. For
example, in the following cases, it should be clear that the
collation is the collation of column x
:
SELECT x FROM T ORDER BY x; SELECT x FROM T WHERE x = x; SELECT DISTINCT x FROM T;
However, when multiple operands are involved, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should this query use the collation of the column
x
, or of the string literal
'Y'
?
Standard SQL resolves such questions using what used to be
called “coercibility” rules. Basically, this
means: Both x
and 'Y'
have collations, so which collation takes precedence? This can
be difficult to resolve, but the following rules cover most
situations:
An explicit COLLATE
clause has a
coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
A “system constant” (the string returned by
functions such as USER()
or
VERSION()
) has a
coercibility of 3.
A literal's collation has a coercibility of 4.
NULL
or an expression that is derived
from NULL
has a coercibility of 5.
The preceding coercibility values are current for MySQL 5.4.
Those rules resolve ambiguities in the following manner:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
If both sides are Unicode, or both sides are not Unicode, it is an error.
If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement will not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It will return a result, and the character set of the
result will be utf8
. The collation
of the result will be the collation of
utf8_column
. Values of
latin1_column
will be automatically
converted to utf8
before
concatenating.
For an operation with operands from the same character
set but that mix a _bin
collation
and a _ci
or _cs
collation, the _bin
collation is
used. This is similar to how operations that mix
nonbinary and binary strings evaluate the operands as
binary strings, except that it is for collations
rather than data types.
Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Because it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.
Examples:
column1 = 'A' |
Use collation of column1
|
column1 = 'A' COLLATE x |
Use collation of 'A' COLLATE x
|
column1 COLLATE x = 'A' COLLATE y |
Error |
The COERCIBILITY()
function can
be used to determine the coercibility of a string expression:
mysql>SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);
-> 0 mysql>SELECT COERCIBILITY(VERSION());
-> 3 mysql>SELECT COERCIBILITY('A');
-> 4
User Comments
Add your own comment.