About the Authors

Alexander Barkov is a Software Developer for MySQL from Izhevsk, Russia and coded MySQL's new XML features.

Peter Gulutzan is a Senior Software Architect at MySQL from Alberta, Canada and will be speaking at MySQL's next conference.

MySQL 5.1's New XML Functions

By Alexander Barkov and Peter Gulutzan

MySQL version 5.1.5 has functions for searching and changing XML documents. This article has examples.

Let's make a database and put two XML documents in it.

CREATE TABLE x (doc VARCHAR(150));

INSERT INTO x VALUES
('
<book>
<title>A guide to the SQL standard</title>
<author>
<initial>CJ</initial>
<surname>Date</surname>
</author>
</book>
');

INSERT INTO x VALUES
('
<book>
<title>SQL:1999</title>
<author>
<initial>J</initial>
<surname>Melton</surname>
</author>
</book>
');

The doc columns have an internal hierarchical structure, with books containing titles and authors, and authors in turn containing initials and surnames. It's a popular way to format and store, and the "markup" -- words like "<book>" and </book>" -- makes it easy to see the hierarchy if you're careful about indentation.

ExtractValue()

Syntax: EXTRACTVALUE (XML_document, XPath_string);
1st Parameter: XML_document string formatted as in the example
2nd Parameter: XPath_string (XPath is a "sub-language")
Action: returns string containing a value from the document

Example #E1

mysql> SELECT EXTRACTVALUE(doc,'/book/author/initial') FROM x;
+------------------------------------------+
| EXTRACTVALUE(doc,'/book/author/initial') |
+------------------------------------------+
| CJ |
| J |
+------------------------------------------+
2 rows in set (0.01 sec)

What happened here? Books contain authors which contain initials. With EXTRACTVALUE() we navigated down through the hierarchy to get the values at the final node points: 'CJ' and 'J'. A basic extraction is just a matter of specifying the hierarchy in the XPath_string argument.

Example #E2

mysql> SELECT EXTRACTVALUE(doc,'/*/*/initial') FROM x;
+----------------------------------+
| EXTRACTVALUE(doc,'/*/*/initial') |
+----------------------------------+
| CJ |
| J |
+----------------------------------+
2 rows in set (0.01 sec)

You don't have to list the whole hierarchy. When part of a path is a wildcard, that means "any name will do".

Example #E3

mysql> SELECT extractValue(doc,'/book/child::*') FROM x;
+---------------------------------------------+
| extractValue(doc,'/book/child::*') |
+---------------------------------------------+
| A guide to the SQL standard |
| SQL:1999 |
+---------------------------------------------+
2 rows in set (0.00 sec)

With /book/child:: we find what's immediately below book, namely the title data. We could use a variety of operators here:
child ... what's immediately below
descendant ... what's below at all levels
parent ... what's immediately above
ancestor ... what's above at all levels
following-sibling ... what's next at same level
preceding-sibling ... what's before at same level
self ... not before, not after, same level

Example #E4

mysql> select
extractValue(doc,'/book/author/surname[self:text()="Date"]') from x;
+--------------------------------------------------------------+
| extractValue(doc,'/book/author/surname[self:text()="Date"]') |
+--------------------------------------------------------------+
| Date |
| |
+--------------------------------------------------------------+
2 rows in set (0.00 sec)

And here's one way to add a predicate (a conditional expression). By saying "in the text of self, that is, in the text of surname because the predicate immediately comes after surname, look for value = Date", we include book/author/surname=Date and we exclude book/author/surname=Melton. The Melton row is blank. Naturally = isn't the only operator we could use here; we could have self:text()>="Date", self:text()="Date" OR self:text()="Melton", and so on.

What you've seen is: an XPath expression can contain nodes separated by slashes (vaguely like a Unix path expression), and you can pick values from one or more nodes. Wildcards, navigation aids, and predicates are supported. Although the examples all used extractValue() in the SELECT list, it can be used in any statement wherever an expression is allowed. A good tip is to combine XML columns with fulltext indexing.

UpdateXML()

Now here's a new function for updating the structure.

Syntax: UPDATEXML (XML_document, XPath_string, new_value);
1st Parameter: XML_document string formatted as in the example
2nd Parameter: XPath_string (XPath is a "sub-language")
3rd Parameter: new_value to replace whatever is found
Action: changes string containing a value from the document

Example #U1

mysql> select UpdateXML(doc,'/book/author/initial','!!') from x;
+----------------------------------------------------------+
| UpdateXML(doc,'/book/author/initial','!!') |
+----------------------------------------------------------+
|
<book>
<title>A guide to the SQL standard</title>
<author>
!!
<surname>Date</surname>
</author>
</book> |
|
<book>
<title>SQL:1999</title>
<author>
!!
<surname>Melton</surname>
</author>
</book> |
+----------------------------------------------------------+
2 rows in set (0.00 sec)

UpdateXML's first two arguments are the same as for ExtractValue because the first thing we want to do is navigate to the node. The third argument is a replacement string. So we change book/author/initial to !!. The return value is the complete new document. To replace the document permanently, you could say UPDATE x SET doc = UpdateXML(doc,'/book/author/initial','!!');

But this is probably a mistake! We didn't just change the text to !!. We changed <initial>CJ></initial> to !! So we changed the document structure. Normally, we only want to change the contents. For that, we should say: select UpdateXML(doc,'/book/author/initial','<initial>!!</initial>') from x;

Example #U2

mysql> select
extractvalue(
UpdateXML(doc,'/book/author/initial','<initial>!!</initial>'),'/book/author/
initial') from x;
+---------------------------------------------------------------------------
--------------------------+
|
extractvalue(
UpdateXML(doc,'/book/author/initial','<initial>!!</initial>'),'/book/author/
initial') |
+---------------------------------------------------------------------------
--------------------------+
| !!
|
| !!
|
+---------------------------------------------------------------------------
--------------------------+
2 rows in set (0.01 sec)

This final example, a combination of ExtractValue() and UpdateXML(), shows what would happen if we change the initial node to !! and then select the initial node. Naturally, we get !!.

Go Get

There's lots more to the XPath story. To get more depth, try these pages about XML and XPath:
http://www.w3.org/TR/xpath
http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
http://en.wikipedia.org/wiki/XPath
http://www.rpbourret.com/xml/XPathIn5.htm

You'll find that most (not all but most) examples on those pages are applicable to MySQL now. That's why it's nice to follow standards.

Or just download MySQL 5.1 and see what you can see. We're glad to get early comments about our alpha versions. The download page is here: http://dev.mysql.com/downloads/mysql/5.1.html