The MySQL server can support multiple character sets. To list
        the available character sets, use the SHOW
        CHARACTER SET statement. A partial listing follows.
        For more complete information, see
        Section 9.1.12, “Character Sets and Collations That MySQL Supports”.
      
mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
...
        Any given character set always has at least one collation. It
        may have several collations. To list the collations for a
        character set, use the SHOW
        COLLATION statement. For example, to see the
        collations for the latin1 (cp1252 West
        European) character set, use this statement to find those
        collation names that begin with latin1:
      
mysql> SHOW COLLATION LIKE 'latin1%';
+---------------------+---------+----+---------+----------+---------+
| Collation           | Charset | Id | Default | Compiled | Sortlen |
+---------------------+---------+----+---------+----------+---------+
| latin1_german1_ci   | latin1  |  5 |         |          |       0 |
| latin1_swedish_ci   | latin1  |  8 | Yes     | Yes      |       1 |
| latin1_danish_ci    | latin1  | 15 |         |          |       0 |
| latin1_german2_ci   | latin1  | 31 |         | Yes      |       2 |
| latin1_bin          | latin1  | 47 |         | Yes      |       1 |
| latin1_general_ci   | latin1  | 48 |         |          |       0 |
| latin1_general_cs   | latin1  | 49 |         |          |       0 |
| latin1_spanish_ci   | latin1  | 94 |         |          |       0 |
+---------------------+---------+----+---------+----------+---------+
        The latin1 collations have the following
        meanings.
      
| Collation | Meaning | 
| latin1_german1_ci | German DIN-1 | 
| latin1_swedish_ci | Swedish/Finnish | 
| latin1_danish_ci | Danish/Norwegian | 
| latin1_german2_ci | German DIN-2 | 
| latin1_bin | Binary according to latin1encoding | 
| latin1_general_ci | Multilingual (Western European) | 
| latin1_general_cs | Multilingual (ISO Western European), case sensitive | 
| latin1_spanish_ci | Modern Spanish | 
Collations have these general characteristics:
Two different character sets cannot have the same collation.
            Each character set has one collation that is the
            default collation. For example, the
            default collation for latin1 is
            latin1_swedish_ci. The output for
            SHOW CHARACTER SET indicates
            which collation is the default for each displayed character
            set.
          
            There is a convention for collation names: They start with
            the name of the character set with which they are
            associated, they usually include a language name, and they
            end with _ci (case insensitive),
            _cs (case sensitive), or
            _bin (binary).
          
In cases where a character set has multiple collations, it might not be clear which collation is most suitable for a given application. To avoid choosing the wrong collation, it can be helpful to perform some comparisons with representative data values to make sure that a given collation sorts values the way you expect.
Collation-Charts.Org is a useful site for information that shows how one collation compares to another.


User Comments
Add your own comment.