[+/-]
This section discusses the procedure for adding a new character set to MySQL. You must have a MySQL source distribution to use these instructions. The proper procedure depends on whether the character set is simple or complex:
If the character set does not need to use special string collating routines for sorting and does not need multi-byte character support, it is simple.
If the character set needs either of those features, it is complex.
For example, greek
and swe7
are simple character sets, whereas big5
and
czech
are complex character sets.
In the following instructions, MYSET
represents the name of the character set that you want to add.
Add a <charset>
element for
MYSET
to the
sql/share/charsets/Index.xml
file. Use
the existing contents in the file as a guide to adding new
contents.
The <charset>
element must list all
the collations for the character set. These must include at
least a binary collation and a default collation. The default
collation is usually named using a suffix of
general_ci
(general, case insensitive). It
is possible for the binary collation to be the default
collation, but usually they are different. The default
collation should have a primary
flag. The
binary collation should have a binary
flag.
You must assign a unique ID number to each collation. The range of IDs from 1024 to 2047 is reserved for user-defined collations. Before MySQL 5.5, the ID must be chosen from the range 1 to 254. To find the maximum of the currently used collation IDs, use this query:
SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;
This step depends on whether you are adding a simple or complex character set. A simple character set requires only a configuration file, whereas a complex character set requires C source file that defines collation functions, multi-byte functions, or both.
For a simple character set, create a configuration file,
,
that describes the character set properties. Create this file
in the MYSET
.xmlsql/share/charsets
directory. (You
can use a copy of latin1.xml
as the basis
for this file.) The syntax for the file is very simple:
Comments are written as ordinary XML comments
(<!--
).
text
-->
Words within <map>
array elements
are separated by arbitrary amounts of whitespace.
Each word within <map>
array
elements must be a number in hexadecimal format.
The <map>
array element for the
<ctype>
element has 257 words.
The other <map>
array elements
after that have 256 words. See
Section 9.3.1, “The Character Definition Arrays”.
For each collation listed in the
<charset>
element for the
character set in Index.xml
,
must contain a MYSET
.xml<collation>
element that defines the character ordering.
For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the character set:
Create the file
ctype-
in the MYSET
.cstrings
directory. Look at one
of the existing ctype-*.c
files (such
as ctype-big5.c
) to see what needs to
be defined. The arrays in your file must have names like
ctype_
,
MYSET
to_lower_
,
and so on. These correspond to the arrays for a simple
character set. See Section 9.3.1, “The Character Definition Arrays”.
MYSET
For each collation listed in the
<charset>
element for the
character set in Index.xml
, the
ctype-
file must provide an implementation of the collation.
MYSET
.c
If you need string collating functions, see Section 9.3.2, “String Collating Support”.
If you need multi-byte character support, see Section 9.3.3, “Multi-Byte Character Support”.
Follow these steps to modify the configuration information.
Use the existing configuration information as a guide to
adding information for MYSYS
. The
example here assumes that the character set has default and
binary collations, but more lines will be needed if
MYSET
has additional collations.
Edit mysys/charset-def.c
, and
“register” the collations for the new
character set.
Add these lines to the “declaration” section:
#ifdef HAVE_CHARSET_MYSET
extern CHARSET_INFO my_charset_MYSET
_general_ci; extern CHARSET_INFO my_charset_MYSET
_bin; #endif
Add these lines to the “registration” section:
#ifdef HAVE_CHARSET_MYSET
add_compiled_collation(&my_charset_MYSET
_general_ci); add_compiled_collation(&my_charset_MYSET
_bin); #endif
If the character set uses
ctype-
,
edit MYSET
.cstrings/Makefile.am
and add
ctype-
to each definition of the MYSET
.cCSRCS
variable, and to the EXTRA_DIST
variable.
If the character set uses
ctype-
,
edit MYSET
.clibmysql/Makefile.shared
and add
ctype-
to the MYSET
.lomystringsobjects
definition.
Edit
config/ac-macros/character_sets.m4
:
Add MYSET
to one of the
define(CHARSETS_AVAILABLE...)
lines
in alphabetic order.
Add MYSET
to
CHARSETS_COMPLEX
. This is needed
even for simple character sets, or
configure will not recognize
--with-charset=
.
MYSET
Add MYSET
to the first
case
control structure. Omit the
USE_MB
and
USE_MB_IDENT
lines for 8-bit
character sets.
MYSET
) AC_DEFINE(HAVE_CHARSET_MYSET
, 1, [Define to enable charsetMYSET
]) AC_DEFINE([USE_MB], 1, [Use multi-byte character routines]) AC_DEFINE(USE_MB_IDENT, 1) ;;
Add MYSET
to the second
case
control structure:
MYSET
) default_charset_default_collation="MYSET
_general_ci" default_charset_collations="MYSET
_general_ciMYSET
_bin" ;;
Reconfigure, recompile, and test.
User Comments
Add your own comment.