ANALYSE([
max_elements
[,max_memory
]])
ANALYSE()
is defined in the
sql/sql_analyse.cc
source file, which
serves as an example of how to create a procedure for use with
the PROCEDURE
clause of
SELECT
statements.
ANALYSE()
is built in and is available by
default; other procedures can be created using the format
demonstrated in the source file.
ANALYSE()
examines the result from a query
and returns an analysis of the results that suggests optimal
data types for each column that may help reduce table sizes. To
obtain this analysis, append PROCEDURE
ANALYSE
to the end of a
SELECT
statement:
SELECT ... FROM ... WHERE ... PROCEDURE ANALYSE([max_elements
,[max_memory
]])
For example:
SELECT col1, col2 FROM table1 PROCEDURE ANALYSE(10, 2000);
The results show some statistics for the values returned by the
query, and propose an optimal data type for the columns. This
can be helpful for checking your existing tables, or after
importing new data. You may need to try different settings for
the arguments so that PROCEDURE ANALYSE()
does not suggest the ENUM
data
type when it is not appropriate.
The arguments are optional and are used as follows:
max_elements
(default 256) is the
maximum number of distinct values that
ANALYSE()
notices per column. This is
used by ANALYSE()
to check whether the
optimal data type should be of type
ENUM
; if there are more than
max_elements
distinct values,
then ENUM
is not a suggested
type.
max_memory
(default 8192) is the
maximum amount of memory that ANALYSE()
should allocate per column while trying to find all distinct
values.
User Comments
I did some tests using a table with 1000000 rows and this function PROCEDURE ANALYSE() returned all values in ENUM data type.
mysql> SELECT id, ativada, cumprida FROM t1 PROCEDURE ANALYSE(1000000,256)\G
I think you may be misunderstanding the syntax here. Let's say we've got a table called charac which has five characters in it:
5 rows in set (0.00 sec)mysql> select * from charac;
If we select * from charac show procedure(), we're passing the default values, so we'll get everything back as enum:
mysql> select * from charac procedure analyse()\G
*************************** 1. row ***************************
Field_name: world.charac.charac
Min_value: A
Max_value: E
Min_length: 1
Max_length: 1
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 1.0000
Std: NULL
Optimal_fieldtype: ENUM('A','B','C','D','E') NOT NULL
1 row in set (0.00 sec)
The first argument refers to the number of elements, and the next argument refers to the total memory assigned. So, if we do this:
mysql> select * from charac procedure analyse(5,24)\G
*************************** 1. row ***************************
Field_name: world.charac.charac
Min_value: A
Max_value: E
Min_length: 1
Max_length: 1
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 1.0000
Std: NULL
Optimal_fieldtype: CHAR(1) NOT NULL
Then it's suggested CHAR(1) for the field which is perhaps more applicable. Hope this helps.
Bug #44060: First option of PROCEDURE ANALYSE() does not work, second needs some work
[15 Apr 2009 5:13] Roel Van de Paar
< PARTIAL WORKAROUND >
In regards the issue with 'ENUM column recommendation output' for PROCEDURE ANALYSE, you
can still 'parly' use this function based on the second argument only.
For instance, if you would like to have a maximum of 50 characters (excluding 'NOT NULL')
for any ENUM column declaration, use the function as follows:
PROCEDURE ANALYSE(1,50);
The '1' will not do anything (as per the bug), and the '50' will define the maximum
numbers of characters for any ENUM (excluding the text 'NOT NULL', as per the bug).
If you do not want to use any ENUM columns at all (and for instance use a linked lookup
table with IDs instead), you can use:
PROCEDURE ANALYSE(1,1);
Having a linked lookup table, allows you the advantage of being able to add new values to
the lookup table later on, and then start inserting the new IDs into the main table
immediately (i.e. no ALTER of the ENUM column is required).
Add your own comment.