Jon Stephens works on the MySQL Documentation Team, writing about MySQL Cluster, Partitioning, Replication, and XML. He's the co-author and editor of the MySQL 5.1 Cluster Certification Guide (MySQL, 2007), and co-authored the PHP 5 Cookbook (Apress, 2005) and MySQL Database Design and Optimzation (Apress, 2004). He is based in Stockholm, Sweden, and can consume his own weight in pepparkakor in one sitting. He blogs about MySQL (and other things) at http://blog.plasticfish.info/, and invites your questions and comments about using XML with MySQL in the MySQL XML Forum.
In this article, we discuss the XML functionality available in MySQL, with an emphasis on new features coming online in MySQL 5.1 and MySQL 6.0. We assume that you already have a working knowledge of XML, and that you know what the terms “valid” and “well-formed” mean. We also assume that you have some knowledge of XPath.
We cover the following topics:
Methods for outputting MySQL data in XML format, including the use of lib_mysqludf_xql
, a third-party library that can be used for this task
Using the functions (new in MySQL 5.1) ExtractValue()
and UpdateXML()
for working with XML and XPath
Storing data from XML in a MySQL database using the LOAD XML
statement (implemented in MySQL 6.0)
Some security considerations to keep in mind when using these techniques
In this section, we discuss how to retrieve data from MySQL in XML format, and how to store data obtained from an XML source in a MySQL database.
In this section, we start with some data already stored in a MySQL table, and demonstrate several different ways to output it in XML format.
Using the --xml
option. Both the mysql and mysqldump client programs support a startup option that causes them to produce XML output. Here is a brief example using the mysql client:
shell> mysql -uroot -e "SHOW VARIABLES LIKE '%version%'" --xml
<?xml version="1.0"?>
<resultset statement="SHOW VARIABLES LIKE '%version%'"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<field name="Variable_name">protocol_version</field>
<field name="Value">10</field>
</row>
<row>
<field name="Variable_name">version</field>
<field name="Value">5.1.22-beta-debug</field>
</row>
<row>
<field name="Variable_name">version_comment</field>
<field name="Value">Source distribution</field>
</row>
<row>
<field name="Variable_name">version_compile_machine</field>
<field name="Value">x86_64</field>
</row>
<row>
<field name="Variable_name">version_compile_os</field>
<field name="Value">suse-linux-gnu</field>
</row>
</resultset>
The content of the <field name="Value">
elements corresponds to the values found in the Value
column displayed when the same statement is executed in the mysql
client without the --xml
option, as shown here:
shell> mysql -uroot -e "SHOW VARIABLES LIKE '%version%'"
+-------------------------+---------------------+
| Variable_name | Value |
+-------------------------+---------------------+
| protocol_version | 10 |
| version | 5.1.22-beta-debug |
| version_comment | Source distribution |
| version_compile_machine | x86_64 |
| version_compile_os | suse-linux-gnu |
+-------------------------+---------------------+
Naturally, the values themselves depend on the MySQL version that you are using and the machine on which it is running, so if you run the same statement, your results are likely to be different from what is shown here.
Example table for XML export. The remainder of this section uses a table created and populated by the following SQL statements:
CREATE SCHEMA xmltest; CREATE TABLE xmltest.cities ( name CHAR(35) NOT NULL DEFAULT '', country CHAR(52) NOT NULL DEFAULT '', population int(11) NOT NULL DEFAULT '0' ); INSERT INTO cities VALUES ('Mumbai (Bombay)','India',10500000); INSERT INTO cities VALUES ('Seoul','South Korea',9981619); INSERT INTO cities VALUES ('São Paulo','Brazil',9968485); INSERT INTO cities VALUES ('Shanghai','China',9696300); INSERT INTO cities VALUES ('Jakarta','Indonesia',9604900); INSERT INTO cities VALUES ('Karachi','Pakistan',9269265); INSERT INTO cities VALUES ('Istanbul','Turkey',8787958); INSERT INTO cities VALUES ('Ciudad de México','Mexico',8591309); INSERT INTO cities VALUES ('Moscow','Russian Federation',8389200); INSERT INTO cities VALUES ('New York','United States',8008278);
This table was originally created using the following SQL statement on the venerable world
example database:
CREATE TABLE xmltest.cities SELECT i.Name AS name, o.Name AS country, i.Population AS population FROM City i JOIN Country o ON i.CountryCode=o.Code ORDER BY i.Population DESC LIMIT 10;
You can obtain a copy of the world
database from http://dev.mysql.com/doc/.
Beginning with MySQL 5.1.12, the <field>
and <row>
format produced by the mysql client matches that produced by mysqldump. However, the root element in the output of mysql --xml
, is <resultset>
, whose statement
attribute contains the SQL statement passed to mysql, as shown here:
shell> mysql -uroot --xml -e 'SELECT * FROM xmltest.cities ORDER BY name'
<?xml version="1.0"?>
<resultset statement="SELECT * FROM xmltest.cities ORDER BY name"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<field name="name">Ciudad de Méico</field>
<field name="country">Mexico</field>
<field name="population">8591309</field>
</row>
<row>
<field name="name">Istanbul</field>
<field name="country">Turkey</field>
<field name="population">8787958</field>
</row>
<row>
<field name="name">Jakarta</field>
<field name="country">Indonesia</field>
<field name="population">9604900</field>
</row>
<row>
<field name="name">Karachi</field>
<field name="country">Pakistan</field>
<field name="population">9269265</field>
</row>
<row>
<field name="name">Moscow</field>
<field name="country">Russian Federation</field>
<field name="population">8389200</field>
</row>
<row>
<field name="name">Mumbai (Bombay)</field>
<field name="country">India</field>
<field name="population">10500000</field>
</row>
<row>
<field name="name">New York</field>
<field name="country">United States</field>
<field name="population">8008278</field>
</row>
<row>
<field name="name">São Paulo</field>
<field name="country">Brazil</field>
<field name="population">9968485</field>
</row>
<row>
<field name="name">Seoul</field>
<field name="country">South Korea</field>
<field name="population">9981619</field>
</row>
<row>
<field name="name">Shanghai</field>
<field name="country">China</field>
<field name="population">9696300</field>
</row>
</resultset>
The output of mysqldump--xml
is structured somewhat differently, as shown here:
shell> mysqldump --xml xmltest cities
<?xml version="1.0"?>
<mysqldump xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<database name="xmltest">
<table_structure name="cities">
<field Field="name" Type="char(35)" Null="NO" Key="" Default=""
Extra=""/>
<field Field="country" Type="char(52)" Null="NO" Key="" Default=""
Extra=""/>
<field Field="population" Type="int(11)" Null="NO" Key="" Default="0"
Extra=""/>
<options Name="cities" Engine="MyISAM" Version="10" Row_format="Fixed"
Rows="10" Avg_row_length="92" Data_length="920"
Max_data_length="25895697857380351" Index_length="1024"
Data_free="0" Create_time="2007-08-24 14:19:42"
Update_time="2007-08-24 14:19:42" Collation="latin1_swedish_ci"
Create_options="" Comment="" />
</table_structure>
<table_data name="cities">
<row>
<field name="name">Mumbai (Bombay)</field>
<field name="country">India</field>
<field name="population">10500000</field>
</row>
<row>
<field name="name">Seoul</field>
<field name="country">South Korea</field>
<field name="population">9981619</field>
</row>
<row>
<field name="name">São Paulo</field>
<field name="country">Brazil</field>
<field name="population">9968485</field>
</row>
<row>
<field name="name">Shanghai</field>
<field name="country">China</field>
<field name="population">9696300</field>
</row>
<row>
<field name="name">Jakarta</field>
<field name="country">Indonesia</field>
<field name="population">9604900</field>
</row>
<row>
<field name="name">Karachi</field>
<field name="country">Pakistan</field>
<field name="population">9269265</field>
</row>
<row>
<field name="name">Istanbul</field>
<field name="country">Turkey</field>
<field name="population">8787958</field>
</row>
<row>
<field name="name">Ciudad de México</field>
<field name="country">Mexico</field>
<field name="population">8591309</field>
</row>
<row>
<field name="name">Moscow</field>
<field name="country">Russian Federation</field>
<field name="population">8389200</field>
</row>
<row>
<field name="name">New York</field>
<field name="country">United States</field>
<field name="population">8008278</field>
</row>
</table_data>
</database>
</mysqldump>
The formatting of some of the XML output has been altered slightly to fit the space available on a printed page.
mysqldump--xml
employs the following
elements:
<mysqldump>
<database>
element whose name
attribute value is the name of that database<table_structure>
element<table_data>
element, and is comprised of <field>
and <row>
elements To save the output of either mysql or mysqldump to a file, simply use the >
operator with the desired filename, as shown here:
shell>mysql -uroot --xml xmltest -e 'SELECT name FROM cities LIMIT 2' > /tmp/2cities.xml
shell>more /tmp/2cities.xml
<?xml version="1.0"?> <resultset statement="SELECT name,country FROM cities LIMIT 2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <row> <field name="name">Mumbai (Bombay)</field> <field name="country">India</field> </row> <row> <field name="name">Seoul</field> <field name="country">South Korea</field> </row> </resultset>
An XML namespace declaration is included in the XML output of both mysql and mysqldump, beginning with MySQL 5.1.18.
There may be times when your application expects an XML format that is different from what is normally obtain from mysql or mysqldump. Suppose, for example, that your application expects a format such as this one:
<cities> <city>name1
<city> <city>name2
<city> <-- etc. --> </cities>
One way to obtain such output would be to use concatenation with the CONCAT()
and GROUP_CONCAT()
functions, like this:
mysql>SELECT CONCAT('\n<cities>\n',
->GROUP_CONCAT(' <city>', name, '</city>\n' SEPARATOR ''),
->'</cities>') AS xmldoc
->FROM cities\G
*************************** 1. row *************************** xmldoc: <cities> <city>Mumbai (Bombay)</city> <city>Seoul</city> <city>Sã Paulo</city> <city>Shanghai</city> <city>Jakarta</city> <city>Karachi</city> <city>Istanbul</city> <city>Ciudad de Méico</city> <city>Moscow</city> <city>New York</city> </cities> 1 row in set (0.01 sec)
Another such example is this format:
<cities> <city name="name1
" population="population1
"/> <city name="name2
" population="population2
"/> <-- etc. --> </cities>
This can be produced as shown here:
mysql>SELECT CONCAT(
->'\n<cities>',
->GROUP_CONCAT(
->'\n\t<city name="', name, '" population="', population, '"/>'
->SEPARATOR ''
->),
->'\n</cities>'
->) AS xmldoc
->FROM cities\G
*************************** 1. row *************************** xmldoc: <cities> <city name="Mumbai (Bombay)" population="10500000"/> <city name="Seoul" population="9981619"/> <city name="Sã Paulo" population="9968485"/> <city name="Shanghai" population="9696300"/> <city name="Jakarta" population="9604900"/> <city name="Karachi" population="9269265"/> <city name="Istanbul" population="8787958"/> <city name="Ciudad de Méico" population="8591309"/> <city name="Moscow" population="8389200"/> <city name="New York" population="8008278"/> </cities> 1 row in set (0.01 sec)
You can see that generating even relatively simple XML
output by such means can rapidly grow very complicated.
While you can wrap such SELECT
statements in stored procedures, the fact that MySQL stored
routines cannot (at least at present) take a variable number
of arguments. Fortunately, there is a third-party library
that can be used with MySQL to make this task easier.
Using the lib_mysqludf_xql
library. The lib_mysqludf_xql
library is a set of UDFs originally written and placed under the GPL by Arnold Daniels. The source for these is now housed at www.mysqludf.org.
mysql>SELECT xql_element('city', name) FROM cities;
+-------------------------------+ | xql_element('city', name) | +-------------------------------+ | <city>Mumbai (Bombay)</city> | | <city>Seoul</city>| | <city>Sã Paulo</city> | | <city>Shanghai</city> | | <city>Jakarta</city> | | <city>Karachi</city> | | <city>Istanbul</city> | | <city>Ciudad de Méico</city> | | <city>Moscow</city> | | <city>New York</city> | +--------------------------------+ 10 rows in set (0.00 sec) mysql>SELECT xql_element('city', NULL, name AS 'name') FROM cities;
+-------------------------------------------+ | xql_element('city', NULL, name AS 'name') | +-------------------------------------------+ | <city name="Mumbai (Bombay)"/>| | <city name="Seoul"/> | | <city name="S㯠Paulo"/> | | <city name="Shanghai"/> | | <city name="Jakarta"/> | | <city name="Karachi"/> | | <city name="Istanbul"/> | | <city name="Ciudad de M鸩co"/> | | <city name="Moscow"/> | | <city name="New York"/> | +-------------------------------------------+ 10 rows in set (0.00 sec) mysql>SELECT xql_element('city', NULL, name AS 'name', population AS 'population')
->FROM cities;
+-----------------------------------------------------------------------+ | xql_element('city', NULL, name AS 'name', population AS 'population') | +-----------------------------------------------------------------------+ | <city name="Mumbai (Bombay)" population="10500000"/> | | <city name="Seoul" population="9981619"/> | | <city name="S㯠Paulo" population="9968485"/> | | <city name="Shanghai" population="9696300"/> | | <city name="Jakarta" population="9604900"/> | | <city name="Karachi" population="9269265"/> | | <city name="Istanbul" population="8787958"/> | | <city name="Ciudad de M鸩co" population="8591309"/> | | <city name="Moscow" population="8389200"/> | | <city name="New York" population="8008278"/> | +-----------------------------------------------------------------------+ 10 rows in set (0.00 sec)
In this section, we discuss some techniques for importing data from XML into a MySQL table.
Using LOAD_FILE()
. The simplest way to store XML in MySQL is to use the LOAD_FILE()
function to open an entire XML document, make it available as a string, and insert this string into a table column. Using the 2cities.xml
file created earlier, it is possible to do something like this:
mysql>USE xmltest;
mysql>CREATE TABLE xmldocs (
->id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
->doc_content BLOB NOT NULL,
->comment VARCHAR(100) NOT NULL DEFAULT ''
->);
Query OK, 0 rows affected (0.04 sec) mysql> INSERT INTO xmldocs VALUES ->(NULL, LOAD_FILE('/tmp/2cities.xml'), '2 cities file');
Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM xmldocs\G *************************** 1. row *************************** id: 1 doc_content: <?xml version="1.0"?> <resultset statement="SELECT name,country FROM cities LIMIT 2 " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <row> <field name="name">Mumbai (Bombay)</field> <field name="country">India</field> </row> <row> <field name="name">Seoul</field> <field name="country">South Korea</field> </row> </resultset> comment: 2 cities file 1 row in set (0.00 sec)
Importing mysqldump XML data into MySQL table columns using a stored procedure. Loading an entire XML file into a single row of a MySQL table solves the problem of getting the XML into MySQL, where it can be parsed using MySQL 5.1's XPath functions (see the section called “XPath Functionality”); however having to do so every time you want to access the data s not terribly convenient. However, MySQL developer Alexander Barkov has written a stored procedure xmldump_load
that extracts data from the XML elements and attributes found in a file created using mysqldump --xml, and inserts this data into the columns of a MySQL table. The source of this stored procedure is shown here:
DELIMITER | DROP PROCEDURE IF EXISTS xmldump_load | CREATE PROCEDURE xmldump_load(file_name VARCHAR(128), database_name VARCHAR(128), table_name VARCHAR(128)) BEGIN DECLARE xml TEXT; DECLARE nrows INT; DECLARE rownum INT DEFAULT 1; DECLARE ncols INT; DECLARE colnum INT DEFAULT 1; DECLARE ins_list TEXT DEFAULT ''; DECLARE val_list TEXT DEFAULT ''; DECLARE tmp VARCHAR(255); # load the XML file's contents into a string SET xml = LOAD_FILE(file_name); # get the number of <row>s in this table SET nrows = ExtractValue(xml, 'count(/mysqldump/database[@name=$database_name]/table_data[@name=$table_name]/row)'); # get the number of <field>s (columns) in this table SET ncols = ExtractValue(xml, 'count(/mysqldump/database[@name=$database_name]/table_data[@name=$table_name]/row[1]/field)'); # for each <row> WHILE rownum <= nrows DO # for each <field> (column) WHILE colnum <= ncols DO SET tmp = ExtractValue(xml, '/mysqldump/database[@name=$database_name]/table_data[@name=$table_name]/row[$rownum]/field[$colnum]/@name'); SET ins_list = CONCAT(ins_list, tmp, IF(colnum<ncols, ',', '')); SET tmp = ExtractValue(xml, '/mysqldump/database[@name=$database_name]/table_data[@name=$table_name]/row[$rownum]/field[$colnum]'); SET val_list = CONCAT(val_list, '''', tmp ,'''', IF(colnum<ncols, ',', '')); SET colnum = colnum + 1; END WHILE; SET @ins_text = CONCAT('INSERT INTO t1 (', ins_list, ') VALUES (', val_list, ')'); SET ins_list = ''; SET val_list = ''; PREPARE stmt FROM @ins_text; EXECUTE stmt; SET rownum = rownum + 1; SET colnum = 1; END WHILE; END | DELIMITER ;
This stored procedure employs XPath user variables (see User variables in XPath expressions) and so runs only under MySQL 5.1.20 or later. You can find a copy of it in the file create-xmpdump-load.sql
included with this article.
The caller of this stored procedure must have the MySQL FILE
privilege.
You can test xmldump_load
using the supplied test.sh
script, shown here:
# Demo for xmldump_load() DB="test" # Change "root" and "mypass" in the following 2 lines to # a user and password appropriate to your installation MYSQL="mysql -uroot -pmypass --socket=/tmp/mysql.sock" MYSQLDUMP="mysqldump -uroot -pmypass --socket=/tmp/mysql.sock" # Creates a test table with two columns and fills it with some data $MYSQL $DB <> END SELECT VERSION(); DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a INT, b VARCHAR(128)); INSERT INTO t1 VALUES (1,'11111'); INSERT INTO t1 VALUES (2,'22222'); INSERT INTO t1 VALUES (3,'33333'); INSERT INTO t1 VALUES (4,'44444'); INSERT INTO t1 VALUES (5,'55555'); INSERT INTO t1 VALUES (6,'66666'); INSERT INTO t1 VALUES (7,'77777'); INSERT INTO t1 VALUES (8,'88888'); INSERT INTO t1 VALUES (9,'99999'); END # Dumps data into an XML file $MYSQLDUMP --xml $DB t1 > /tmp/t1.xml # Empties the table $MYSQL --execute="DELETE FROM t1" $DB # Creates the procedure, calls it, and # makes sure we've restored all records $MYSQL $DB << END \. create-xmldump-load.sql CALL xmldump_load('/tmp/t1.xml', 'test', 't1'); SELECT * FROM t1; END # Performs cleanup # Comment out the remaining lines if you wish to # preserve the stored procedure, table, and XML # file following the test run $MYSQL $DB << END DROP PROCEDURE xmldump_load; DROP TABLE t1; END rm /tmp/t1.xml
Using LOAD XML
. An implementation contributed by Erik Wetterberg of a new SQL statement has been accepted for MySQL 6.0, and is available beginning with version 6.0.3. LOAD XML
greatly simplifies the task of importing data from an XML file into a MySQL table, without having to use the stored procedure discussed previously (see Importing mysqldump XML data into MySQL table columns using a stored procedure). The syntax for this statement is as shown here:
LOAD XML [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'filename
' [REPLACE | IGNORE] INTO TABLE [db_name
.]tbl_name
[CHARACTER SETcharset_name
] [ROWS IDENTIFIED BY '<tagname
>'] [IGNOREnumber
[LINES | ROWS]] [(column_or_user_var
,...)] [SETcol_name
=expr
,...]
This statement reads data from an XML file into a table, and acts as the complement mysql or mysqldump in XML output mode (that is, using the --xml
option). The filename
must be given as a literal string. The tagname
in the optional ROWS IDENTIFIED BY
clause must also be given as a literal string, and must be surrounded by angle brackets (<
and >
).
LOAD DATA
clauses used in LOAD XML
. If you have used MySQL's LOAD DATA
statement before, then you should find the following clauses familiar, since they work in essentially the same way for LOAD XML
as they do for LOAD DATA
:
LOW_PRIORITY
or CONCURRENT
LOCAL
REPLACE
or IGNORE
CHARACTER SET
IGNORE number LINES
clause is analogous to the LOAD DATA
statement's IGNORE ... LINES
clause. LOAD XML also accepts IGNORE number ROWS
. In either case, the clause causes the first number
rows in the XML file to be skipped, and not to be imported.(column_or_user_var,...)
SET
See LOAD DATA INFILE
Syntax in the MySQL 5.1 Manual for more information about these clauses.
ROWS IDENTIFIED BY '<tagname
>'.
Accepted input formats.LOAD XML
supports three different XML formats:
<row
column1
="value1
"column2
="value2
" .../>
<row
> <column1
>value1
</column1
> <column2
>value2
</column2
> </row
>
name
attributes of <field>
tags,
and column values are taken from the contents of these
tags:
<row> <field name='column1
'>value1
</field> <field name='column2
'>value2
</field> </row>
This is the format used by MySQL tools such as mysqldump
The import routine used by LOAD
XML
automatically detects the format used for each
row and interprets it correctly, matching based on the tag
or attribute name and the column name. You can easily verify
this for yourself, by creating an XML file that uses any two
or even all three formats and then using LOAD XML
to import it into a table.
You must have the FILE
privilege to use LOAD XML
.
In this section, we look at the XPath functions added in
MySQL 5.1.5. ExtractValue()
allows you to use an XPath expression on a fragment of XML in
order to return the content of one or more elements. UpdateXML()
makes it possible to
replace an existing XML fragment with a new one, using XPath
to specify the fragment to be replaced.
ExtractValue()
FunctionThe example in Using LOAD_FILE()
demonstrates how to get the content of an XML file into a MySQL database, but the problem of getting at the actual data remains. One way to accomplish this is to use the ExtractValue()
function.
The syntax for this function is shown here:
ExtractValue(xml_fragment
, xpath_expression
)
ExtractValue()
takes two
arguments. The first of these is the XML fragment to be
tested; the second is the XPath expression to be matched.
ExtractValue()
example. Let' see how we might obtain the
name of the first city in the document (which we saved as
2cities.xml
). We do this in
two steps. First, we get the XML from the xmldocs
table and place into a user
variable:
mysql>SELECT doc_content FROM xmldocs LIMIT 1 INTO @xml;
Query OK, 1 row affected (0.00 sec) mysql>SELECT @xml\G
*************************** 1. row *************************** @xml: <?xml version="1.0"?> <resultset statement="SELECT name,country FROM cities LIMIT 2 " xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <row> <field name="name">Mumbai (Bombay)</field> <field name="country">India</field> </row> <row> <field name="name">Seoul</field> <field name="country">South Korea</field> </row> </resultset> 1 row in set (0.00 sec)
Now we use ExtractValue()
with @xml
as the first
argument. For the second argument, we employ an XPath
expression that means “the
content of the first <field>
element contained in the
second <row> element found anywhere in the
document”:
mysql> SELECT ExtractValue(@xml, '//row[2]/field[1]');
+-----------------------------------------+
| ExtractValue(@xml, '//row[2]/field[1]') |
+-----------------------------------------+
| Seoul |
+-----------------------------------------+
1 row in set (0.00 sec)
Another way to do this would be to use an XPath expression
that means “the content of the
second <field>
having the name
attribute
"name"”:
mysql> SELECT ExtractValue(@xml, '//field[@name="name"][2]');
+------------------------------------------------+
| ExtractValue(@xml, '//field[@name="name"][2]') |
+------------------------------------------------+
| Seoul |
+------------------------------------------------+
1 row in set (0.00 sec)
As you can see, the result (Seoul
) is the same as before, which is exactly what we would expect.
UpdateXML()
Function This function allows you to do replace a portion of an XML
fragment (identified by an XPath locator) with different XML
markup. For example, consider the XML fragment <book><chapter/></book>
. Now suppose you wish to change this to <book><part><chapter/></part></book>
. This shows how you can do so using UpdateXML()
, saving the result into a user variable @new_xml
:
mysql> SELECT @new_xml:=UpdateXML('<book><chapter/></book>', -> '//chapter', -> '<part><chapter/></part>') -> AS uxml; +--------------------------------------------------------------------+ | uxml | +--------------------------------------------------------------------+ | <book><part><chapter/></part></book> | +--------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT @new_xml; +--------------------------------------+ | @new_xml | +--------------------------------------+ | <book><part><chapter/></part></book> | +--------------------------------------+ 1 row in set (0.00 sec)
The syntax for this function is shown here:
UpdateXML(xml
,locator
,replacement
)
xml
is the XML markup to be tested for a match.
locator
is the XPath expression used to obtain the match.
replacement
is the markup to be used to replace the XML that
matches the
locator
. Unlike ExtractValue()
,
UpdateXML()
matches both
elements and content, then returns the updated XML as a
string. If no match for the
locator
is found, then the original
xml
is returned.
In this section, we discuss the XPath functionality
provided for use with the ExtractValue()
and UpdateXML()
function in MySQL 5.1 and MySQL 6.0.
Supported functions and operators. Basic XPath expressions (known as locators) are supported. This includes locators using the following operators:
/
(slash)
operator. This operator acts in a
manner similar to how a slash behaves in a Unix
filesystem path, where the leading /
represents the root of the
document, and the identifier following it is the name
of an XML element. For example, /book
matches a top-level book
element. Multiple
slashes can be used to trace a branch; for example,
/book/chapter
matches
<book><chapter/></book>
where book
is a top-level
element and chapter
is a
child of that element. A leading double-slash (//) means that the pattern is
matched anywhere in the XML document; for example, the
locator //section
matches
a section
element
wherever it is found, regardless of this element'
relationship to any other elements in the document.
*
(wildcard)
operator. This operator matches any
element. For example, //chapter/*
matches any element that is a
child of a chapter
element, anywhere in the document, and /*/section matches any section that is a child of a
top-level element.
|
(UNION
)
operator. This operator can be used to
combine locators when you wish to match against any of
them. For example, //section|//paragraph
matches any section or paragraph element, anywhere
in the document.
element
[@
attribute
=
value
]
. For example, //section[@id='xpath-locators']
matches a
section
anywhere in the
document having an id
attribute with the value “ xpath-locators
”.
You can match against multiple attribute values by
merely combining them. For instance, the locator
//paragraph[@role='intro'][@title='XPath
support']
matches any paragraph
element (anywhere in the XML
document to be tested) having a role
attribute whose value
is “ intro
” and a title
attribute whose value is
“
XPath support
”.
To find elements for which the same attribute
matches one of several values, you can use multiple
locators joined by the |
operator. For example, to match all paragraph
elements,
anywhere in the XML to be tested, whose title
attributes have
either of the values “
Example
” or “
Syntax
”, you would use the expression
//paragraph[@title="Example"]|//paragraph[@title="Syntax"]
.
You can also use the logical or
operator for this purpose: //paragraph[@title="Example" or
@title="Syntax"]
.
The difference between or
and |
is
that or
joins
conditions, while |
joins result sets.
:
are allowed, so you can work with XML markup
that uses namespace notation. For example, //person:biography matches
the tag <person:biography/>
anywhere
in the document.
or
, and, =
, !=
, <=, <, >=, and >) are supported. For
example, consider the following:
mysql>SET @xml = '<foo bar="2">123</foo><foo bar="6">456</foo>';
Query OK, 0 rows affected (0.00 sec) mysql>SELECT ExtractValue(@xml, '//foo[@bar="2"]');
+---------------------------------------+ | extractvalue(@xml, '//foo[@bar="2"]') | +---------------------------------------+ | 123 | +---------------------------------------+ 1 row in set (0.00 sec) mysql>SELECT ExtractValue(@xml, '//foo[@bar>"2"]');
+---------------------------------------+ | extractvalue(@xml, '//foo[@bar>"2"]') | +---------------------------------------+ | 456 | +---------------------------------------+ 1 row in set (0.00 sec) mysql>SELECT ExtractValue(@xml, '//foo[@bar>="2"]');
+----------------------------------------+ | extractvalue(@xml, '//foo[@bar>="2"]') | +----------------------------------------+ | 123 456 | +----------------------------------------+ 1 row in set (0.01 sec)
Notice that when multiple matches are found, ExtractValue()
returns them
in a single space-delimited string.
//row[2]/field[1]
.
ExtractValue()
, as it name
suggests, obtains a value; it does not return any XML elements. Here is an
example that should help make this clear:
mysql>SET @xml = '<a>3<b><c>5<d/></c></b></a>';
Query OK, 0 rows affected (0.00 sec) mysql>SELECT ExtractValue(@xml, '//a'), ExtractValue(@xml, '//b'), ExtractValue(@xml, '//c');
+---------------------------+---------------------------+---------------------------+ | ExtractValue(@xml, '//a') | ExtractValue(@xml, '//b') | ExtractValue(@xml, '//c') | +---------------------------+---------------------------+---------------------------+ | 3 | | 5 | +---------------------------+---------------------------+---------------------------+ 1 row in set (0.01 sec)
ExtractValue(@xml, '//a')
returns only the content of any <a>
elements; it does not return any
of the elements <b>
, <c>
, or <d>
, or any of the content of these
elements. ExtractValue(@xml,
'//c')
returns only the content of any <c>
elements. Because
the sole <b>
element contains no text, but rather only other XML
elements, ExtractValue(@xml,
'//b')
returns an empty string.
Unsupported XPath functions and operators. Many XPath functions and operators are supported; however, MySQL's XPath support is still under development, and so some of these are not yet implemented. These include the following limitations:
Relative locator expressions are resolved in the context of the root node. For example, consider the following query and result:
mysql>SELECT ExtractValue(
->'<book>
-><paragraph title="Example">P1</paragraph>
-><paragraph title="Syntax">P2</paragraph>
-></book>',
->'book/paragraph'
->) AS paras;
+-------+ | paras | +-------+ | P1 P2 | +-------+ 1 row in set (0.03 sec)
In this case, the locator book/paragraph
is resolved as /book/paragraph.
Relative locators are also supported within
predicates. In the following example, ../@title="Example" is
resolved as /book/paragraph/@title="Example"
:
mysql>SELECT ExtractValue(
->'<book>
-><paragraph title="Example"><body>P1</body></paragraph>
-><paragraph title="Syntax"><body>P2</body></paragraph>
-></book>',
->'book/paragraph/body[../@title="Example"]')
->AS para;
+------+ | para | +------+ | P1 | +------+ 1 row in set (0.00 sec)
The ::
operator is not
supported in combination with node types such as
axis
::comment(),
, axis
::text()
, and
axis
::processing-instructions()axis
::node().
However, name tests (such as
and axis
::
name
) are supported, as shown in these
examples: axis
::*
mysql>SELECT ExtractValue('<a><b>x</b><c>y</c></a>','/a/child::b');
+-------------------------------------------------------+ | ExtractValue('<a><b>x</b><c>y</c></a>','/a/child::b') | +-------------------------------------------------------+ | x | +-------------------------------------------------------+ 1 row in set (0.02 sec) mysql>SELECT ExtractValue('<a><b>x</b><c>y</c></a>','/a/child::*');
+-------------------------------------------------------+ | ExtractValue('<a><b>x</b><c>y</c></a>','/a/child::*') | +-------------------------------------------------------+ | x y | +-------------------------------------------------------+ 1 row in set (0.01 sec)
“Up-and-down” navigation is not supported in cases where the path would lead “above” the root element. That is, you cannot use expressions which match on descendants of ancestors of a given element, where one or more of the ancestors of the current element is also an ancestor of the root element (see Bug #16321).
The following XPath functions are not supported:
id()
, lang(), local-name()
, name()
(a patch has recently been submitted to
implement this function, and is likely to appear in
MySQL 6.0), namespace-uri()
, normalize-space()
, starts-with()
, string()
, substring-after()
, substring-before()
, and translate().
The following axes are not supported: following-sibling, following, preceding-sibling, and preceding.
Error handling. For both ExtractValue()
and UpdateXML()
, the XPath locator
used must be valid and the XML to be searched must be
well-formed. If the locator is invalid, an error is
generated:
mysql> SELECT ExtractValue('<foo bar="2">123</foo>', '//foo[@bar>="2"]');
+------------------------------------------------------------+
| ExtractValue('<foo bar="2">123</foo>', '//foo[@bar>="2"]') |
+------------------------------------------------------------+
| 123 |
+------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT ExtractValue('<foo bar="2">123</foo>', '//foo@bar>="2"]');');
ERROR 1105 (HY000): XPATH syntax error: '@bar>="2"]'
If the XML to be searched is not well-formed, then NULL
is returned, and a warning is
issued:
mysql>SELECT ExtractValue('<foo bar="2"123</foo>', '//foo[@bar>="2"]');
+-----------------------------------------------------------+ | ExtractValue('<foo bar="2"123</foo>', '//foo[@bar>="2"]');') | +-----------------------------------------------------------+ | NULL | +-----------------------------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql>SHOW WARNINGS;
+---------+------+--------------------------------------------------------------------------------------------+ | Level | Code | Message| +---------+------+--------------------------------------------------------------------------------------------+ | Warning | 1522 | Incorrect XML value: 'parse error at line 1 pos 13: unknown token unexpected ('>' wanted)' | +---------+------+--------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
However, the replacement XML used as the third argument to
UpdateXML()
is
not checked for well-formedness.
Space does not permit a complete discussion of XPath syntax and usage here. For definitive information, see the XML Path Language (XPath) 1.0 standard. The Zvon.org XPath Tutorial also provides a useful resource for those who are new to XPath or who would like a refresher in XPath basics.
User variables in XPath expressions. Beginning with MySQL 5.1.20, you can employ user variables in XPath locators in either (or both) of two forms:
Weakly checked. A variable using the
syntax $@
is not checked for type or for whether
it has previously been assigned a value.
No warnings or errors are issued by the server
if a variable has the wrong type or is
undefined. In other words, you are responsible for any
typographical errors or omissions. For example, if you
use variable_name
$@myvairable
instead
of $@myvariable
, and
$@myvairable
has not
been assigned a value, then MySQL assumes that $@myvairable has a
“none” value of the appropriate type,
such as 0
or an empty
string.
Example.
mysql>SET @xml = '<a><b>X</b><b>Y</b></a>';
Query OK, 0 rows affected (0.00 sec) mysql>SET @i =1; @j = 2;
Query OK, 0 rows affected (0.00 sec) mysql>SELECT @i, ExtractValue(@xml, '//b[$@i]');
+------+--------------------------------+ | @i | ExtractValue(@xml, '//b[$@i]') | +------+--------------------------------+ | 1 | X | +------+--------------------------------+ 1 row in set (0.00 sec) mysql>SELECT @j, ExtractValue(@xml, '//b[$@j]');
+------+--------------------------------+ | @j | ExtractValue(@xml, '//b[$@j]') | +------+--------------------------------+ | 2 | Y | +------+--------------------------------+ 1 row in set (0.00 sec) mysql>SELECT @k, ExtractValue(@xml, '//b[$@k]');
+------+--------------------------------+ | @k | ExtractValue(@xml, '//b[$@k]') | +------+--------------------------------+ | NULL | | +------+--------------------------------+ 1 row in set (0.00 sec)
Strongly checked. A variable using
the syntax $
can be declared and used with MySQL's
XPath functions when they are called inside stored
procedures. Such a variable is local to the stored
procedure in which it is defined, and it is checked
for type and value. variable_name
Example.
mysql>DELIMITER |
mysql>CREATE PROCEDURE myproc ()
->BEGIN
->DECLARE i INT DEFAULT 1;
->DECLARE xml VARCHAR(25) DEFAULT '<a>X</a><a>Y</a><a>Z</a>';
-> ->WHILE i < 4 DO
->SELECT xml, i, ExtractValue(xml, '//a[$i]');
->SET i = i+1;
->END WHILE;
->END |
Query OK, 0 rows affected (0.01 sec) mysql>DELIMITER ;
mysql>CALL myproc;
+--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | <a>X</a><a>Y</a><a>Z</a> | 1 | X | +--------------------------+---+------------------------------+ 1 row in set (0.00 sec) +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | <a>X</a><a>Y</a><a>Z</a> | 2 | Y | +--------------------------+---+------------------------------+ 1 row in set (0.01 sec) +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | <a>X</a><a>Y</a><a>Z</a> | 3 | Z | +--------------------------+---+------------------------------+ 1 row in set (0.01 sec)
Expressions containing user-defined variables of either sort must otherwise (except for notation) conform to the rules for XPath expressions containing variables as given in the XPath specification.
With any database functionality, you must be mindful of its
security implications, and working with MySQL's XML and XPath
capabilities is no different in this regard. There are a
number of points to consider, including the ability to read
from files using LOAD XML INFILE
,
the fact that the MySQL privilege system does not apply to the
content of XML documents, and the possibilities for subversive
user input to have unintended consequences.
As with the LOAD DATA
statement, the transfer of the XML file from the client host
to the server host is initiated by the MySQL server. In
theory, a patched server could be built that would tell the
client program to transfer a file of the server's choosing
rather than the file named by the client in the LOAD XML
statement. Such a server
could access any file on the client host to which the client
user has read access.
In a Web environment, clients usually connect to MySQL
from a Web server. A user that can run any command against
the MySQL server can use LOAD XML
LOCAL
to read any files to which the Web server
process has read access. In this environment, the client
with respect to the MySQL server actually is the Web server,
not the remote program being run by the user who connects to
the Web server.
You can disable LOAD XML
on
the server by starting it with --local-infile=0
or --local-infile=OFF
. The result is shown in this
example:
shell>mysqld_safe --local-infile=OFF &
shell>mysql -uroot xtest
Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 6.0.4-alpha-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql>LOAD XML LOCAL INFILE '/home/jon/person.xml'
->INTO TABLE person
->ROWS IDENTIFIED BY '<person>';
ERROR 1148 (42000): The used command is not allowed with this MySQL version
This option can also be used when starting the
mysql
client to disable LOAD
XML
for the duration of the client session.
To prevent a client from loading XML files from the
server, do not grant the FILE
privilege to the corresponding MySQL user account, or revoke
this privilege if the client user account already has it:
shell>mysql -uroot -p
Password:********
Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 6.0.4-alpha-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql>REVOKE FILE ON *.* FROM jon@localhost;
Query OK, 0 rows affected (0.00 sec) mysql>exit
Bye shell>mysql -ujon -p
Enter password:********
Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 6.0.4-alpha-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql>USE xmltest;
Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A mysql>LOAD XML INFILE '/home/jon/person.xml'
->INTO TABLE person
->ROWS IDENTIFIED BY '>person>';
ERROR 1045 (28000): Access denied for user 'jon'@'localhost' (using password: YES)
Revoking the FILE
privilege
(or not granting it in the first place) keeps the user
only from executing the LOAD XML
INFILE
statement or the LOAD_FILE()
function; it does
not prevent the user from executing LOAD XML LOCAL INFILE
. To
disallow this statement, you must start the server or the
client with --local-infile=OFF
, as discussed in the previous
section (see the section called
“Loading Data from Files”).
In other words, the FILE
privilege affects only whether the client can read files
on the server; it has no bearing on whether the client can
read files on the local filesystem.
The granularity found in MySQL's privilege system with regard to most database objects does not extend to XML documents. A MySQL client having access to an XML document can access the entire document or any part of it, and there is no way in which this can be restricted to particular XML fragments or elements.
Keep in mind that MySQL does not have preventative privileges; that is, you cannot disallow access of a given type on a particular database object. Instead, if you wish to prevent a client from accessing XML found in a database table, then you must grant privileges to this client in such a way that the client has no access to the table at all, or only to columns of that table which do not contain the XML that you do not wish the client to read.
One of the greatest security threats to applications is code injection whereby malicious code is introduced into the system to gain unauthorized access to privileges and data. These are known to exist in any number of programming and scripting languages; what they all have in common is the exploiting of the assumptions made by developers about the type and content of data input from users. XPath is no exception in this regard.
Let's suppose that your application handles authorization
by matching the combination the combination of a login names
and password with those found in an XML file named users.xml
, whose contents are
shown here:
<?xml version="1.0" encoding="UTF-8"?> <users> <user id="00327"> <login>douglas42</login> <password>24ph0d</password> </user> <user id="13579"> <login>cherrygarcia</login> <password>1c3cr34m</password> </user> <user id="02403"> <login>jimbob</login> <password>p4nc4k35</password> </user> <user id="42354"> <login>kitten</login> <password>m3330w</password> </user> <user id="28570"> <login>lucyvanpelt</login> <password>f0076411</password> </user> </users>
Assuming that each user's combination of login name and password is unique, your application could use an XPath expression like this one to validate the user, begin a user session, and associate the session with the user's unique ID:
//user[login/text()='cherrygarcia' and password/text()='1c3cr34m']/attribute::id
This is the XPath equivalent of an SQL statement like this one:
SELECT id FROM users WHERE login='cherrygarcia' AND password='1c3cr34m';
A PHP application employing XPath and the users.xml
file might handle the
login process via a Web form like this:
<?php $file = "users.xml"; $login = $POST["login"]; $password = $POST["password"]; $xpath = "//user[login/text()=$login and password/text()=$password]/attribute::id"; if( file_exists($file) ) { $xml = simplexml_load_file($file); if($result = $xml->xpath($xpath)) echo "You are now logged in as user $result[0]."; else echo "Invalid login name or password."; } else exit("Failed to open $file."); ?>
The input is completely unchecked, which means that a
malevolent user can short-circuit the test by entering ' or 1=1
for both the login name
and password, resulting in $xpath
being evaluated as shown here:
//user[login/text()='' or 1=1 and password/text()='' or 1=1]/attribute::id
The expression inside the square brackets evaluates as
true
, and so is effectively
the same as this one, which matches the id
attribute of every user
element in the XML document:
//user/attribute::id
One way in which this particular attack can be
circumvented is simply by quoting the variable names to be
interpolated in the definition of $xpath
:
$xpath = "//user[login/text()='$login' and password/text()='$password']/attribute::id";
This forces the values passed from the user to be converted to strings.
If this “fix” seems familiar, that is because it is the same one that is often recommended for helping to prevent SQL injection attacks. In general, the practices you should follow for preventing XPath injection attacks are the same as for preventing SQL injection:
Never accepted untested data from users in your application.
Check all user-submited data for type; reject or convert data that is of the wrong type
Test numerical data for out of range values; truncate, round, or reject values that are out of range. Test strings for illegal characters and either strip them out or reject input containing them.
Do not output explicit error messages that might provide an unauthorized user with clues that could be used to compromise the system; log these to a file or database table instead.
Just as SQL injection attacks can be used to obtain information about database schemas, so can XPath injection be used to traverse XML files to uncover their structure. Space does not permit us to go into detail here, but you can read more about such techniques in Amit Klein's paper Blind XPath Injection (PDF, 46KB).
It is also important to check the output being sent back
to the client. For an example, let's revisit our previous
example; however, this time, instead of using PHP's XPath
functionality, we use the MySQL ExtractValue()
function:
mysql>SELECT ExtractValue(
->LOAD_FILE('users.xml'),
->'//user[login/text()="" or 1=1 and password/text()="" or 1=1]/attribute::id'
->) AS id;
+-------------------------------+ | id | +-------------------------------+ | 00327 13579 02403 42354 28570 | +-------------------------------+ 1 row in set (0.01 sec)
Because ExtractValue()
returns multiple matches as a single space-delimited string,
this injection attack provides to the user in a single row
every valid ID contained within users.xml
. As an extra safeguard, you should also
test output before returning it to the user. Here is a
simple example:
mysql>SELECT @id = ExtractValue(
->LOAD_FILE('users.xml'),
->'//user[login/text()="" or 1=1 and password/text()="" or 1=1]/attribute::id'
->);
Query OK, 0 rows affected (0.00 sec) mysql>SELECT IF(
->INSTR(@id, ' ') = 0,
->@id,
->'Unable to retrieve user ID')
->AS singleID;
+----------------------------+ | singleID | +----------------------------+ | Unable to retrieve user ID | +----------------------------+ 1 row in set (0.00 sec)
In general, the guidelines for returning data to users securely are the same as for accepting user input. These can be summed up as:
Always test outgoing data for type and allowable values.
Never allow unauthorized users to view error messages that might provide information about the application that could be used to exploit it.
Following these principles in architecting an XML application can help to insure that it is a secure application.
Read and post comments on this article in the MySQL Forums. There are currently 12 comments.