About the Author

Anders Karlsson is a sales engineer with MySQL AB, and lives in Stockholm, Sweden.

We are always looking for interesting articles about MySQL! Have you written something and would like to it published here? Please contact us via feedback form »

GIS and Spatial Extensions with MySQL

This is the first of an on-going series of articles that explain some the new features in MySQL 4.1, which as of this writing is in the gamma phase of its development cycle, on the way to a production-ready release in the near future.

By Anders Karlsson

MySQL 4.1 introduces spatial functionality in MySQL. This article describes some of the uses of spatial extensions in a relational database, how it can be implemented in a relational database, what features are present in MySQL and some simple examples.

Often the spatial functions are called geographic information system (GIS) functions, because GIS applications are the most obvious use-case for the spatial functionality. The spatial functions can be used both to provide a means to organize GIS data together with more traditional types of data and to represent non-GIS data with a spatial attribute.

GIS applications then and now

A GIS (geographic information system) stores and looks up objects which have one or more spatial attributes, such as size and position, and is used to process such objects. A simple example would be a system that stores addresses in a town using geographic coordinates. If this rather static data was then combined with other information, such as the location of a taxi-cab, then this data could be used to find the closest cab to a certain location. Many GIS applications are very specialized, in areas such as the mapping of real-world objects, map creation and meteorology.

Among the many attributes of a general GIS is that the objects may have multiple dimensions, and that complex shapes are supported. Typically two-dimensional objects with nearly unlimited complexity need to be supported.

Another attribute of a general GIS is that it supports combining objects and looking for different types of overlaps, such as looking for points that are contained within a given geometrical object. Lastly, a GIS provides a means of organizing objects in layers, i.e. REGIONs inside DISTRICTs.

A good example of an application where GIS is of importance and which also has real-world implications is in the area of meteorology. When it comes to producing weather maps we see every day on the TV screen, meteorology is a complex science. Very advanced and high-powered IT systems that are concerned with the actual number-crunching are combined with large databases. The issue is not that data is not available, the issue is what data to use and how, and how the different types of data are matched. The data from the number-crunching systems is supplied in the form of large files with multi-dimensional arrays of prognosticated data. The issue now is, how can a specific data object be matched with another, and provide a relevant prognosis of the weather in the coming five days.

In other words, we have data covering at least three dimensions (two-dimensional geography plus a height of a measurement for example, maybe combined with a time dimension).

What is obvious from the above discussion is that a database that only works on simple data types, such as INTEGER or DECIMAL, will just not be good enough for this type of application. The geographical shapes might be any type of polygon that then needs to be matched with some other object, i.e. what is the total area of two objects (which is not necessarily the sum of the areas of the two objects, as they might overlap).

The traditional solution to all this has been to use specialized database systems that use proprietary spatial indexing, and proprietary interfaces. These systems are complex though, and not only that, any GIS stores some non-GIS data (for example, a meteorological database might store a temperature measurement at a given geographical location, height and time as a DECIMAL). And for non-GIS data, SQL has become the standard. As applications for GIS have expanded beyond what goes on in labs and high-powered scientific institutions and are found in more traditional applications, such a taxicab scheduling systems, the need for a more standardized solution has exploded. Another factor here is the availability of inexpensive and easy to use GPS systems which may be combined with mobile communication systems, so that getting a real-time position from, a taxi-cab, a ship on the sea or a police car is no longer complex or expensive.

OpenGIS - Linking SQL with GIS functionality

The desire to combine SQL with GIS was initially driven by the developers of GIS applications desiring to expand into other areas, more than by SQL database vendors getting into the GIS market.

The driving organization behind opening up GIS to a broader market and making GIS technologies available everywhere the Open Geospatial Consortium (OGC). OGC has been around for about 10 years and is a non-profit organization that works on many areas of GIS and provides specifications of interoperability with many other standards. One such standard that OGC has provided is a specification for interoperability SQL databases.

This specification, in short, defines extensions to a SQL based relational database to allow for GIS objects and operations. There are four important areas here:

Data types. There needs to be data types to store the GIS information. This is best illustrated with an example, a POINT in a 2-dimensional system.
Operations. There must be additional operators to support the management of multi-dimensional objects, again, this is best illustrated with an example, a function that computes the AREA of a polygon of any shape.
The ability to input and output GIS data. To make systems interoperable, OGC has specified how contents of GIS objects are represented in binary and text format.
Indexing of spatial data. To use the different operators, some means of indexing of GIS data is needed, or in technical terms, spatial indexing.

In addition to the above, there is also a need for GIS metadata, and in some cases for using different coordinate systems. MySQL currently supports a planar coordinate system. The other major coordinate system in use is the geocentric one, i.e. a coordinate system on the Earth's surface, which is not yet supported by MySQL.

Non-GIS use for MySQL spatial extensions

There are uses for spatial extensions outside of the pure GIS world. Spatial data is not limited to maps or to the Earth's geography. The definition of spatial data and operations is wide, and even though MySQL follows the OpenGIS specification to a large extent, this specification does not limit the use of spatial data to GIS applications.

Any type of data that has more than one dimension can be treated as a spatial entity. The specification does not limit the axis on the coordinate system to any particular unit, such as inches or centimeters, and each axis does not necessarily have to represent the same unit: they are just numbers. Of course, certain functionality assumes that the X and Y axis have similar properties, such as the Area() function.

Fact is, a lot of real-world data has two dimensions if we look at it this way, in particular if we assume that one dimension is a date. For example we could store the maximum and minimum price for a stock as a POINT, where the X axis is the date and the Y axis is the price. We can then use the spatial functions to check for overlaps and intersections with other stocks.

The OpenGIS� Simple Features specifications for SQL

This specification is what is used as the ground for almost all implementations of GIS-functions within an SQL-based relational database. This standard defines the data types, operations, input and output format, functions and much more. This is the standard that is followed by almost all SQL databases with spatial extensions, including MySQL.

MySQL GIS Datatypes (abstract types in gray)

The data types start from the most generic at the top of the hierarchy, GEOMETRY, to a number of specific types, such as POINT and LINESTRING. Some of the data types are "abstract" in the sense that you cannot create objects of this type, such as the GEOMETRY type. This does not mean that you cannot have a column of the type GEOMETRY, just that you cannot have any value in that column of the type GEOMETRY, but you can have values of any other spatial type in that column. Among the abstract object types, only GEOMETRY can be used as a column type.

Among the functions that can be performed on spatial objects are functions that evaluate the proximity of objects, such as Overlap(), functions that combine two spatial objects to create another spatial object, such as Envelope() and functions that perform conversion to and from text and binary formats, such as AsText() and GeomFromText(). As any spatial object can be treated as a GEOMETRY object, any function that operate on a GEOMETRY can operate on any other type of spatial object, such as a POINT, which doesn't necessarily mean that this operation always is meaningful.

The specification includes the definition of two formats for external representation of spatial data, WKB (Well Known Binary) and WKT (Well Known Text). This allows data to be exported and imported in binary and text formats. To complement these formats, there are functions that convert between the WKB and WKT formats and all of the spatial data types.

Spatial Indexing

Spatial data may be indexed, just like other data in MySQL. To make this effective, a special type of indexing is used for spatial data called R-tree ("R" stands for Region) indexing. The involves organizing the minimum bounding rectangle (MBR) of the spatial objects in a tree structure that is then used by the different spatial functions. There are a few variations of R-tree indexing, MySQL uses R-trees with quadratic splitting, which is one of the standard methods of building an R-tree index.

An R-tree index is similar to a B+-tree in many ways, and organizes the indexed nodes in a hierarchy where the nodes in the index represent the MBR of the objects in the node. The leaf nodes in the index contain references to the row that contain the data, just like a B+-tree index.

Current Spatial features in MySQL

In the current implementation of spatial extensions in MySQL, parts of the OpenGIS Simple Features have been left out. Among these are:

Functions - Some functions are missing, among them some that deal with creating new spatial data values from existing ones, such as Union() (not to be confused with the SQL UNION operator) and Intersection() (not to be confused with the Intersect() spatial function).
GIS Metadata - The OpenGIS Simple Feature specification defines certain metadata tables for GIS data.
SRID - A Spatial Reference Identifier (SRID) defines the properties of the coordinate system. MySQL stores the SRID of any spatial objects and it can be extracted, but it is not used and the coordinate system is always supposed to be planar.

Except the above, all of the features of the OpenGIS Simple Features specification are included in MySQL 4.1 Spatial Extensions, support for the missing features will be added in future MySQL versions.

MySQL Spatial features - Examples

The first example is a very simple database with just two tables: one that contains addresses and the location of the address as a POINT, and a second table that contains information on taxi cabs, including the location of the cab, again as a POINT.

Note that we also create a spatial index on both tables.

CREATE TABLE address (
  address CHAR(80) NOT NULL,
  address_loc POINT NOT NULL,
  PRIMARY KEY(address),
  SPATIAL KEY(address_loc)
);

CREATE TABLE cab (
  cab_id INT AUTO_INCREMENT NOT NULL,
  cab_driver CHAR(80) NOT NULL,
  cab_loc POINT NOT NULL,
  PRIMARY KEY(cab_id),
  SPATIAL KEY(cab_loc)
);

Now we need to insert some data into these tables to test the spatial features. To do this, the easiest way is to use the GeomFromText() function, that takes a string in WKT format and converts that to a spatial object.

INSERT INTO address VALUES('Foobar street 12', GeomFromText('POINT(2671 2500)'));
INSERT INTO address VALUES('Foobar street 56', GeomFromText('POINT(2971 2520)'));
INSERT INTO address VALUES('Foobar street 78', GeomFromText('POINT(3171 2510)'));
INSERT INTO address VALUES('Foobar street 97', GeomFromText('POINT(5671 2530)'));
INSERT INTO address VALUES('Foobar street 99', GeomFromText('POINT(6271 2460)'));
INSERT INTO address VALUES('Bloggs lane 10', GeomFromText('POINT(5673 3520)'));
INSERT INTO address VALUES('Bloggs lane 20', GeomFromText('POINT(5665 3550)'));
INSERT INTO address VALUES('Bloggs lane 45', GeomFromText('POINT(5571 3510)'));

INSERT INTO cab VALUES(0, 'Joe Bloggs', GeomFromText('POINT(2262 2100)'));
INSERT INTO cab VALUES(0, 'Bill Bloggs', GeomFromText('POINT(2441 1980)'));
INSERT INTO cab VALUES(0, 'Sam Spade', GeomFromText('POINT(5400 3200)'));

Now we have some data to work with, let's try a spatial query: We will query for the closest cab to a given location. As both of the columns involved in the query have the POINT data type, the standard OpenGIS function Distance() could be used. Regrettably, this is not yet implemented in the MySQL Spatial Extensions, so we have to take a different route. (The Distance() function will be implemented in a later release of MySQL.) We convert the two POINT values to a LINESTRING and then compute the length of that, which will be the same as the distance between the two points. The finished query then looks like this:

SELECT
  c.cab_driver,
  ROUND(GLength(LineStringFromWKB(LineString(AsBinary(c.cab_loc),
                                             AsBinary(a.address_loc)))))
    AS distance
FROM cab c, address a
WHERE a.address = 'Foobar street 110'
ORDER BY distance ASC LIMIT 1;

+------------+----------+
| cab_driver | distance |
+------------+----------+
| Sam Spade  |     1143 |
+------------+----------+
1 row in set (0.00 sec)

A non-GIS example of MySQL Spatial extensions

A lot more data than you might think has multi-dimensional properties; it's just that we do not often think of it as such. If we introduce time as one dimension, a lot of data may fit in the multi-dimensional model. Let's have a look at the example mentioned earlier with stock prices. To being with, we need a table to store the data. This will contain the stock ticker symbol (such as GOOG), and the starting and ending value of the ticker over a day, represented as a LINESTRING, where the X-axis is the date, and the Y-axis is the value. The start point of each ticker value is the day's date and the opening value, the end point is the date of the following day and the value of the stock by the end of the day. In this case, a date is just an 8-digit integer.

CREATE TABLE stock (
  ticker CHAR(5) NOT NULL,
  ticker_date DATE NOT NULL,
  ticker_range LINESTRING NOT NULL,
  PRIMARY KEY(ticker, ticker_date)
);

The next step is to insert some data:

INSERT INTO stock VALUES('FOO', '20040401',
  GeomFromText('LINESTRING(20040101 50,20040102 55)'));
INSERT INTO stock VALUES('FOO', '20040402',
  GeomFromText('LINESTRING(20040102 55,20040103 67)'));
INSERT INTO stock VALUES('FOO', '20040403',
  GeomFromText('LINESTRING(20040103 67,20040104 63)'));
INSERT INTO stock VALUES('FOO', '20040404',
  GeomFromText('LINESTRING(20040104 63,20040105 57)'));
INSERT INTO stock VALUES('FOO', '20040405',
  GeomFromText('LINESTRING(20040105 57,20040106 58)'));

INSERT INTO stock VALUES('BAR', '20040401',
  GeomFromText('LINESTRING(20040101 37,20040102 38)'));
INSERT INTO stock VALUES('BAR', '20040402',
  GeomFromText('LINESTRING(20040102 38,20040103 42)'));
INSERT INTO stock VALUES('BAR', '20040403',
  GeomFromText('LINESTRING(20040103 42,20040104 45)'));
INSERT INTO stock VALUES('BAR', '20040404',
  GeomFromText('LINESTRING(20040104 45,20040105 62)'));
INSERT INTO stock VALUES('BAR', '20040405',
  GeomFromText('LINESTRING(20040105 62,20040106 56)'));

INSERT INTO stock VALUES('BLG', '20040401',
  GeomFromText('LINESTRING(20040101 54,20040102 85)'));
INSERT INTO stock VALUES('BLG', '20040402',
  GeomFromText('LINESTRING(20040102 85,20040103 87)'));
INSERT INTO stock VALUES('BLG', '20040403',
  GeomFromText('LINESTRING(20040103 87,20040104 93)'));
INSERT INTO stock VALUES('BLG', '20040404',
  GeomFromText('LINESTRING(20040104 93,20040105 62)'));
INSERT INTO stock VALUES('BLG', '20040405',
  GeomFromText('LINESTRING(20040105 65,20040106 52)'));

Now, the query we are to issue will find the stocks and dates that have values that are within a range of a given stock's value. For this example, the date is ignored. To make this query a bit more interesting, let's issue it as a single query, using another new MySQL 4.1 feature, subqueries.

SELECT s2.ticker, s2.ticker_date
FROM stock s2
WHERE s2.ticker != 'FOO'
AND MBRINTERSECTS(s2.ticker_range, (SELECT LineStringFromWKB(LineString(
  Point(MIN(X(STARTPOINT(s1.ticker_range))),
  MIN(IF(Y(STARTPOINT(s1.ticker_range)) < Y(ENDPOINT(s1.ticker_range)),
  Y(STARTPOINT(s1.ticker_range)), Y(ENDPOINT(s1.ticker_range))))),
  Point(MAX(X(STARTPOINT(s1.ticker_range))),
  MAX(IF(Y(STARTPOINT(s1.ticker_range)) > Y(ENDPOINT(s1.ticker_range)),
  Y(STARTPOINT(s1.ticker_range)), Y(ENDPOINT(s1.ticker_range)))))))
FROM stock s1
WHERE s1.ticker = 'FOO')) = 1;

This query constructs a LINESTRING in the subquery which as its first point has the start date, and the lowest value of the stock during the period. The second point is the end date, and the highest value of the stock during the period. The MBRINTERSECT() will check if the MBR of the objects passed as arguments intersect, and if so, will return 1, else 0. The result of the query above is:

+--------+-------------+
| ticker | ticker_date |
+--------+-------------+
| BAR    | 2004-04-04  |
| BAR    | 2004-04-05  |
| BLG    | 2004-04-01  |
| BLG    | 2004-04-04  |
| BLG    | 2004-04-05  |
+--------+-------------+
5 rows in set (0.00 sec)

Conclusions

This just scratches the surface of what you can do with the MySQL Spatial Extensions in MySQL 4.1, and hopefully gets you thinking about ways you can use it with even non-spatial data. You can get all of the details about the spatial functionality in the "Spatial Extensions in MySQL" chapter of the MySQL Reference Manual, and learn more about the OpenGIS standards the Open Geospatial Consortium (OGC) website.