Each type of plugin has its own type-specific structures and
functions. The primary structure is the type-specific plugin
descriptor. This is pointed to by the info
member of the st_mysql_plugin
general
plugin descriptor, but has a structure determined by the
requirements of the plugin type.
For version-control purposes, the first member of the type-specific descriptor for every plugin type is expected to be the interface version for the type. This enables the server to check the type-specific version for every plugin no matter its type. The type-specific descriptor commonly includes callback functions and other information needed by the server to invoke the plugin properly.
The following discussion describes the full-text parser plugin type-specific descriptor.
For a full-text parser plugin, the type-specific descriptor
is an instance of the st_mysql_ftparser
structure in the plugin.h
file:
struct st_mysql_ftparser { int interface_version; int (*parse)(MYSQL_FTPARSER_PARAM *param); int (*init)(MYSQL_FTPARSER_PARAM *param); int (*deinit)(MYSQL_FTPARSER_PARAM *param); };
As shown by the structure definition, the descriptor has an
interface version number and contains pointers to three
functions. The version is specified using a symbol of the
form
MYSQL_
(such as
(xxx
_INTERFACE_VERSIONMYSQL_FTPARSER_INTERFACE_VERSION
for
full-text parser plugins) The init
and
deinit
members should point to a function
or be set to 0 if the function is not needed. The
parse
member must point to the function
that performs the parsing.
A full-text parser plugin is used in two different contexts, indexing and searching. In both contexts, the server calls the initialization and deinitialization functions at the beginning and end of processing each SQL statement that causes the plugin to be invoked. However, during statement processing, the server calls the main parsing function in context-specific fashion:
For indexing, the server calls the parser for each column value to be indexed.
For searching, the server calls the parser to parse the
search string. The parser might also be called for rows
processed by the statement. In natural language mode,
there is no need for the server to call the parser. For
boolean mode phrase searches or natural language
searches with query expansion, the parser is used to
parse column values for information that is not in the
index. Also, if a boolean mode search is done for a
column that has no FULLTEXT
index,
the built-in parser will be called. (Plugins are
associated with specific indexes. If there is no index,
no plugin is used.)
The plugin declaration in the general plugin descriptor has
init
and deinit
members that point initialization and deinitialization
functions, and so does the type-specific plugin descriptor
to which it points. However, these pairs of functions have
different purposes and are invoked for different reasons:
For the plugin declaration in the general plugin descriptor, the initialization and deinitialization functions are invoked when the plugin is loaded and unloaded.
For the type-specific plugin descriptor, the initialization and deinitialization functions are invoked per SQL statement for which the plugin is used.
Each interface function named in the plugin descriptor
should return zero for success or nonzero for failure, and
each of them receives an argument that points to a
MYSQL_FTPARSER_PARAM
structure containing
the parsing context. The structure has this definition:
typedef struct st_mysql_ftparser_param { int (*mysql_parse)(struct st_mysql_ftparser_param *, char *doc, int doc_len); int (*mysql_add_word)(struct st_mysql_ftparser_param *, char *word, int word_len, MYSQL_FTPARSER_BOOLEAN_INFO *boolean_info); void *ftparser_state; void *mysql_ftparam; struct charset_info_st *cs; char *doc; int length; int flags; enum enum_ftparser_mode mode; } MYSQL_FTPARSER_PARAM;
The definition shown is current as of MySQL 5.1.12. It is incompatible with versions of MySQL 5.1 older than 5.1.12.
The structure members are used as follows:
mysql_parse
A pointer to a callback function that invokes the
server's built-in parser. Use this callback when the
plugin acts as a front end to the built-in parser. That
is, when the plugin parsing function is called, it
should process the input to extract the text and pass
the text to the mysql_parse
callback.
The first parameter for this callback function should be
the param
value itself:
param->mysql_parse(param, ...);
A front end plugin can extract text and pass it all at once to the built-in parser, or it can extract and pass text to the built-in parser a piece at a time. However, in this case, the built-in parser treats the pieces of text as though there are implicit word breaks between them.
mysql_add_word
A pointer to a callback function that adds a word to a
full-text index or to the list of search terms. Use this
callback when the parser plugin replaces the built-in
parser. That is, when the plugin parsing function is
called, it should parse the input into words and invoke
the mysql_add_word
callback for each
word.
The first parameter for this callback function should be
the param
value itself:
param->mysql_add_word(param, ...);
ftparser_state
This is a generic pointer. The plugin can set it to point to information to be used internally for its own purposes.
mysql_ftparam
This is set by the server. It is passed as the first
argument to the mysql_parse
or
mysql_add_word
callback.
cs
A pointer to information about the character set of the text, or 0 if no information is available.
doc
A pointer to the text to be parsed.
length
The length of the text to be parsed, in bytes.
flags
Parser flags. This is zero if there are no special
flags. Currently, the only nonzero flag is
MYSQL_FTFLAGS_NEED_COPY
, which means
that mysql_add_word()
must save a
copy of the word (that is, it cannot use a pointer to
the word because the word is in a buffer that will be
overwritten.) This member was added in MySQL 5.1.12.
This flag might be set or reset by MySQL before calling
the parser plugin, by the parser plugin itself, or by
the mysql_parse()
function.
mode
The parsing mode. This value will be one of the folowing constants:
MYSQL_FTPARSER_SIMPLE_MODE
Parse in fast and simple mode, which is used for indexing and for natural language queries. The parser should pass to the server only those words that should be indexed. If the parser uses length limits or a stopword list to determine which words to ignore, it should not pass such words to the server.
MYSQL_FTPARSER_WITH_STOPWORDS
Parse in stopword mode. This is used in boolean searches for phrase matching. The parser should pass all words to the server, even stopwords or words that are outside any normal length limits.
MYSQL_FTPARSER_FULL_BOOLEAN_INFO
Parse in boolean mode. This is used for parsing
boolean query strings. The parser should recognize
not only words but also boolean-mode operators and
pass them to the server as tokens via the
mysql_add_word
callback. To tell
the server what kind of token is being passed, the
plugin needs to fill in a
MYSQL_FTPARSER_BOOLEAN_INFO
structure and pass a pointer to it.
If the parser is called in boolean mode, the
param->mode
value will be
MYSQL_FTPARSER_FULL_BOOLEAN_INFO
. The
MYSQL_FTPARSER_BOOLEAN_INFO
structure
that the parser uses for passing token information to the
server looks like this:
typedef struct st_mysql_ftparser_boolean_info { enum enum_ft_token_type type; int yesno; int weight_adjust; bool wasign; bool trunc; /* These are parser state and must be removed. */ byte prev; byte *quot; } MYSQL_FTPARSER_BOOLEAN_INFO;
The parser should fill in the structure members as follows:
type
The token type. This should be one of values shown in the following table.
Type | Meaning |
FT_TOKEN_EOF |
End of data |
FT_TOKEN_WORD |
A regular word |
FT_TOKEN_LEFT_PAREN |
The beginning of a group or subexpression |
FT_TOKEN_RIGHT_PAREN |
The end of a group or subexpression |
FT_TOKEN_STOPWORD |
A stopword |
yesno
Whether the word must be present for a match to occur. 0 means that the word is optional but increases the match relevance if it is present. Values larger than 0 mean that the word must be present. Values smaller than 0 mean that the word must not be present.
weight_adjust
A weighting factor that determines how much a match for
the word counts. It can be used to increase or decrease
the word's importance in relevance calculations. A value
of zero indicates no weight adjustment. Values greater
than or less than zero mean higher or lower weight,
respectively. The examples at
Section 11.8.2, “Boolean Full-Text Searches”, that use the
<
and >
operators illustrate how weighting works.
wasign
The sign of the weighting factor. A negative value acts
like the ~
boolean-search operator,
which causes the word's contribution to the relevance to
be negative.
trunc
Whether matching should be done as if the boolean-mode
*
truncation operator had been given.
Plugins should not use the prev
and
quot
members of the
MYSQL_FTPARSER_BOOLEAN_INFO
structure.
User Comments
Add your own comment.