Other articles

MySQL Newsletter Efficiently using VARCHAR columns

Newsletters older than 6 months may have links that are out of date. Please use the Search to check for updated links.

VARCHAR column type is one of the most popular types in generic database. It looks pretty simple, however there are some tricks which can give you good performance benefit.

Limit the length: One may think as VARCHAR column are always dynamic it does not make any difference if you have VARCHAR(40) column or VARCHAR(255) column. However difference exists. When MySQL will need to do the sort it will use corresponding fixed length field format, which will lead to larger temporary file and so slower sorting.

Reduce fragmentation: If you're using VARCHAR TEXT or BLOB fields you're very likely to run into fragmentation problem with MyISAM tables. Row will not be located in one place but scattered over data file instead. This may reduce performance significantly. The easy way to defragment a table is to run OPTIMIZE TABLE mytable, however is there any way to reduce fragmentation? Yes. You just can avoid changing the row size - this can be done by adding VARCHAR/BLOB column which contains varying length filler to keep the row size static. If you need to increase length of some field you reduce the length of filler at the same size. Please only keep in mind filler should not add too much waste percentage to your rows or the drawbacks from larger data size may overcome reduced fragmentation.

Use index on prefix: Indexes on varchar columns take a lot of space and so generally can be slow. If your data is very different even by the start of the string you may benefit from creating index on column prefix:
ALTER TABLE MYTABLE ADD KEY NAME_KEYT (NAME(20))

Moreover you can have keys packed so reducing their size even more creating table with option PACK_KEYS=1

Note these options are currently available for MyISAM/ISAM tables only.

There are some drawbacks to consider of creation key on the column prefix. MySQL will not be able to use the index to retrieve complete value any more also could be required to scan many rows by index in case of prefix is not very selective for your data.

You can make a raw check if some prefix length is suitable by your data by comparing:
SELECT COUNT(DISTINCT NAME) FROM MYTABLE
and
SELECT COUNT(DISTINCT SUBSTRING(NAME,L,<prefix_length>)) FROM MYTABLE;

The difference should not be very large for key on the prefix to work efficiently.

Related Pages:

MySQL Newsletter Efficiently using VARCHAR columns