By Jeremy Cole, MySQL AB
18 February 2004
Over a series of articles spanning a few weeks, we'll explore subqueries in depth. Subqueries are a complex and large subject, so it's best if we break it into a few logical parts. In this article, we'll look at basic, simple subqueries, and subqueries in the FROM clause.
As we go over the examples, you might want to open up the MySQL Documentation to the section on Subqueries.
A subquery is, in essence, a SELECT statement that is most often used as part of another SELECT statement, but could also be used with an INSERT, UPDATE, or DELETE and other statements. Subqueries are used in order to achieve very complex searches and complex reports, as well as for various optimizations. You can use a subquery in the FROM clause of a SELECT to allow you to have essentially any number of GROUP BY clauses to further and further refine your result.
In order to use subqueries in MySQL you will need a version of MySQL 4.1. Any version would be OK, but as 4.1 is alpha, you'll definitely want the latest. So, go and grab the newest 4.1 and get it installed before moving on... or just be a spectator and read the examples.
All the examples used in this article will be based on the world database, which is available from the MySQL website as world.sql.gz.
Well, the obvious first thing to take a look at, is the simplest possible subquery you could issue:
SELECT (SELECT 1); +------------+ | (SELECT 1) | +------------+ | 1 | +------------+
It's possible to nest subqueries, and pass values through any number of levels, for instance:
SELECT (SELECT (SELECT 1)); +---------------------+ | (SELECT (SELECT 1)) | +---------------------+ | 1 | +---------------------+
Simple, right? But... not very useful.
Let's ignore that some things could be done more efficient ways (mostly JOINs) for now, and just take a look at some possible subqueries and their results. First, we'll look at a very simple query that could be done a number of other ways:
SELECT name, headofstate FROM Country WHERE code=(SELECT "SWE"); +--------+-----------------+ | name | headofstate | +--------+-----------------+ | Sweden | Carl XVI Gustaf | +--------+-----------------+
OK, I'll admit, that's useless. Stay with me here! If we try a similar query, but this time a bit more useful: Let's find the information for the country (or countries) with the highest population:
SELECT name, headofstate, population FROM Country WHERE population=(SELECT MAX(population) FROM Country); +-------+-------------+------------+ | name | headofstate | population | +-------+-------------+------------+ | China | Jiang Zemin | 1277558000 | +-------+-------------+------------+
For you MySQL experts out there, you might note that I could've done the same thing with ORDER BY and LIMIT, but LIMIT is MySQL-specific, so that solution wouldn't be portable to other databases, and would also not always be quite exactly the same result. (With ORDER BY and LIMIT, if there are two countries with the exact same, and highest population, only one would be selected.) The above subquery is portable to most databases. Let's try something still more complex: The information about the country with the most official languages.
Subqueries in the FROM clause (a.k.a. "derived tables") are one of the most useful features of the new subquery support. Essentially, it's simply a subquery that you are allowed to put in the FROM part of a query. When MySQL processes the query it pretends that the result of the subquery is actually another table. First, a simple example, to show you what I mean:
SELECT foo FROM (SELECT 1 AS foo) AS tbl; +-----+ | foo | +-----+ | 1 | +-----+
You'll notice the part at the end: FROM (SELECT 1 AS foo) AS tbl is put where you would normally find a list of tables. You'll also notice that I have aliased the subquery (the part in the parenthesis) with AS tbl. All subqueries in the FROM clause must be aliased to something, anything, otherwise you'll get an error. Every table must have a name, and this subquery is no exception.
If we move on to a more complicated (and arguably more useful) example, we'll start to see the power of subqueries. Let's find out what the highest number of official languages in any country is:
/* 1 */ SELECT MAX(tbl.nr) AS nr FROM ( /* 2 */ SELECT countrycode, COUNT(*) AS nr FROM CountryLanguage WHERE isofficial='T' GROUP BY countrycode ) AS tbl; +---------+ | MAX(nr) | +---------+ | 4 | +---------+
Again, those of you who are MySQL experts might note that the same answer could be found with an ORDER BY and LIMIT. Again, I have to alias the subquery part to something, and I chose AS tbl. Let's do something that couldn't be done easily without subqueries: Let's get the information for those countries with the the highest number of official languages:
/* 1 */ SELECT name, population, headofstate, top.nr FROM Country, ( /* 2 */ SELECT countrycode, COUNT(*) AS nr FROM CountryLanguage WHERE isofficial='T' GROUP BY countrycode HAVING nr=( /* 3 */ SELECT MAX(summary.nr_official_languages) FROM ( /* 4 */ SELECT countrycode, COUNT(*) AS nr_official_languages FROM CountryLanguage WHERE isofficial='T' GROUP BY countrycode ) AS summary ) ) as top WHERE Country.code=top.countrycode +--------------+------------+-------------+----+ | name | population | headofstate | nr | +--------------+------------+-------------+----+ | Switzerland | 7160400 | Adolf Ogi | 4 | | South Africa | 40377000 | Thabo Mbeki | 4 | +--------------+------------+-------------+----+
This query is not so simple. It nests four SELECT statements one inside the other, to accomplish a somewhat complex result. I have added comments to the statement of the form /* 1 */ to mark each statement, so that we can break the query down logically. If we look at each of the individual queries, by number, and in reverse order, which I find easiest (and is also the order in which MySQL must solve the query):
4. | This SELECT finds all countries and the number of official languages for each. We're interested in what the maximum number is, and there may be more than one with the same maximum. |
3. | This SELECT finds the maximum of nr_official_languages from SELECT #4. There isn't an SQL-standard way of getting the countrycode column for the max at the same time, since there may be more than one. |
2. | In this SELECT we match up the country codes based on the number of official languages. |
1. | Finally, we can JOIN the result against the Country table to find the country information about the matching countries. |
I hope you've learned something, and will do some of your own reading about subqueries. In the next article of the series, we'll discuss Correlation, ANY, and EXISTS.