Alexander Barkov is a Software Developer for MySQL from Izhevsk, Russia and coded MySQL's new XML features.
Peter Gulutzan is a Senior Software Architect at MySQL from Alberta, Canada and will be speaking at MySQL's next conference.
By Alexander Barkov and Peter Gulutzan
MySQL version 5.1.5 has functions for searching and changing XML documents. This article has examples.
Let's make a database and put two XML documents in it.
CREATE TABLE x (doc VARCHAR(150)); INSERT INTO x VALUES (' <book> <title>A guide to the SQL standard</title> <author> <initial>CJ</initial> <surname>Date</surname> </author> </book> '); INSERT INTO x VALUES (' <book> <title>SQL:1999</title> <author> <initial>J</initial> <surname>Melton</surname> </author> </book> ');
The doc columns have an internal hierarchical structure, with books containing titles and authors, and authors in turn containing initials and surnames. It's a popular way to format and store, and the "markup" -- words like "<book>" and </book>" -- makes it easy to see the hierarchy if you're careful about indentation.
mysql> SELECT EXTRACTVALUE(doc,'/book/author/initial') FROM x; +------------------------------------------+ | EXTRACTVALUE(doc,'/book/author/initial') | +------------------------------------------+ | CJ | | J | +------------------------------------------+ 2 rows in set (0.01 sec)
What happened here? Books contain authors
which contain initials. With EXTRACTVALUE()
we navigated down through the hierarchy to
get the values at the final node points:
'CJ' and 'J'. A basic extraction is just a
matter of specifying the hierarchy in the
XPath_string argument.
mysql> SELECT EXTRACTVALUE(doc,'/*/*/initial') FROM x; +----------------------------------+ | EXTRACTVALUE(doc,'/*/*/initial') | +----------------------------------+ | CJ | | J | +----------------------------------+ 2 rows in set (0.01 sec)
You don't have to list the whole hierarchy. When part of a path is a wildcard, that means "any name will do".
mysql> SELECT extractValue(doc,'/book/child::*') FROM x; +---------------------------------------------+ | extractValue(doc,'/book/child::*') | +---------------------------------------------+ | A guide to the SQL standard | | SQL:1999 | +---------------------------------------------+ 2 rows in set (0.00 sec)
With /book/child::
we find what's immediately
below book, namely the title data. We could
use a variety of operators here:
child
... what's immediately below
descendant
... what's below at all levels
parent
... what's immediately above
ancestor
... what's above at all levels
following-sibling
... what's next at same level
preceding-sibling
... what's before at same level
self
... not before, not after, same level
mysql> select extractValue(doc,'/book/author/surname[self:text()="Date"]') from x; +--------------------------------------------------------------+ | extractValue(doc,'/book/author/surname[self:text()="Date"]') | +--------------------------------------------------------------+ | Date | | | +--------------------------------------------------------------+ 2 rows in set (0.00 sec)
And here's one way to add a predicate (a conditional expression).
By saying "in the text of self, that is, in the text of surname
because the predicate immediately comes after surname, look for
value = Date", we include book/author/surname=Date
and we
exclude book/author/surname=Melton
. The Melton row is blank.
Naturally =
isn't the only operator we could use here; we
could have self:text()>="Date"
, self:text()="Date" OR
self:text()="Melton"
, and so on.
What you've seen is: an XPath expression can contain nodes
separated by slashes (vaguely like a Unix path expression),
and you can pick values from one or more nodes. Wildcards,
navigation aids, and predicates are supported. Although the
examples all used extractValue()
in the SELECT list, it can
be used in any statement wherever an expression is allowed.
A good tip is to combine XML columns with fulltext indexing.
Now here's a new function for updating the structure.
mysql> select UpdateXML(doc,'/book/author/initial','!!') from x; +----------------------------------------------------------+ | UpdateXML(doc,'/book/author/initial','!!') | +----------------------------------------------------------+ | <book> <title>A guide to the SQL standard</title> <author> !! <surname>Date</surname> </author> </book> | | <book> <title>SQL:1999</title> <author> !! <surname>Melton</surname> </author> </book> | +----------------------------------------------------------+ 2 rows in set (0.00 sec)
UpdateXML's first two arguments are the same as for
ExtractValue because the first thing we want to do
is navigate to the node. The third argument is a
replacement string. So we change book/author/initial
to !!
. The return value is the complete new document.
To replace the document permanently, you could say
UPDATE x SET doc = UpdateXML(doc,'/book/author/initial','!!');
But this is probably a mistake! We didn't just change
the text to !!
. We changed
<initial>CJ></initial>
to
!!
So we changed the document structure. Normally, we only want
to change the contents. For that, we should say:
select UpdateXML(doc,'/book/author/initial','<initial>!!</initial>') from x;
mysql> select extractvalue( UpdateXML(doc,'/book/author/initial','<initial>!!</initial>'),'/book/author/ initial') from x; +--------------------------------------------------------------------------- --------------------------+ | extractvalue( UpdateXML(doc,'/book/author/initial','<initial>!!</initial>'),'/book/author/ initial') | +--------------------------------------------------------------------------- --------------------------+ | !! | | !! | +--------------------------------------------------------------------------- --------------------------+ 2 rows in set (0.01 sec)
This final example, a combination of ExtractValue()
and
UpdateXML()
, shows what would happen if we change the
initial node to !!
and then select the initial node.
Naturally, we get !!
.
There's lots more to the XPath story. To get more
depth, try these pages about XML and XPath:
http://www.w3.org/TR/xpath
http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html
http://en.wikipedia.org/wiki/XPath
http://www.rpbourret.com/xml/XPathIn5.htm
You'll find that most (not all but most) examples on those pages are applicable to MySQL now. That's why it's nice to follow standards.
Or just download MySQL 5.1 and see what you can see. We're glad to get early comments about our alpha versions. The download page is here: http://dev.mysql.com/downloads/mysql/5.1.html