By Giuseppe Maxia |
|
Opening the path
The wizard took a sheet of paper and drew a table.
|
"We will go for it manually." He said. "This way, we are going to understand what to ask the database engine to do." It was typical of the Wizard. When he was in this explaining mood, I should better let him talk.
"Let's start with the first row. It's Roma. The employee is male, so we write 1 under M and 0 under F. Then we get the second line. It's again Roma. Which gender is this one? If we have a male, then we are going to add 1 to the value under M, and add a 0 under F, and so on."
|
He looked at me, as if expecting me to see the light and have a magic understanding of the algorithm he was hinting at. My blank stare must have told him that I was still at large.
"Don't you get it? It's simple. We sum to M if the employee is male, and we sum to F if she's a female." He stressed the words sum and if. Then he grabbed my keyboard and modified my previous statement:
mysql> SELECT location, SUM(IF(gender='M',1,0)) AS M, SUM(IF(gender='F',1,0)) AS F -> FROM locations INNER JOIN employees USING (loc_code) GROUP BY location;
|
"So we are telling the engine to do exactly the same thing that we would have done manually. Sum ... if. Only the engine will do it faster."
I said "Wow!" but my mind was racing to see how this incredibly simple statement could be of help. "What about the total column?" I asked.
"Oh, that. Here you are." And he modified the statement once more:
mysql> SELECT location, SUM(IF(gender='M',1,0)) AS M, -> SUM(IF(gender='F',1,0)) AS F, COUNT(*) AS total -> GROUP by location;
|
"I don't think I really understand, though." I said. "We need to count, but we are summing up. How comes?"
"From the SQL point of view, we are doing the same thing. COUNT of star and SUM of one are the same thing. Try it yourself. Type a 'select COUNT star from employees'".
mysql> SELECT COUNT(*) from employees;
|
"Now replace COUNT of star with SUM of one."
mysql> SELECT SUM(1) from employees;
|
"It's the same!" I said, excited.
"No, actually it's not. COUNT of star is optimized by MySQL, and it is performed from the table descriptor, without actually counting the records. You can't realize the difference in such a small table. If you had one million records, and you were actually counting by groups, you would see that SUM takes a couple of milliseconds more than COUNT, and I think we can live with that. Notice that we could not use COUNT in our cross-tab, because it would have counted all the rows anyway. Try it."
mysql> SELECT location, COUNT(IF(gender='M',1,0)) AS M, -> COUNT(IF(gender='F',1,0)) AS F, -> COUNT(*) AS total -> FROM locations INNER JOIN employees USING (loc_code) -> GROUP BY location;(warning: gives WRONG results!)
|
"See? That's why we have to sum up, instead of counting. COUNT is a dumb function which will count any piece of junk it finds. SUM has some grace, in its choice."
It looked so trivial that I was ashamed of myself for not having found it alone.
But suddenly I saw something that didn't seem right to me. "Here we have a simple case, where we know all the values that will go into the columns. But what should we do if we don't know? What if we want the departments instead?"
The Wizard took a glance at the diagram and typed:
mysql> SELECT dept from departments;
|
"Yeah. I see." I said, with a hint of disappointment in my voice. "You mean that I have to compose the query manually, entering a SUM/IF statement for each value in departments?"
DevShed.com is the independent Open Source Web Development Site. Fresh tutorials, articles and discussion of MySQL, PHP, Perl, Python, Apache, JSP and administration can be found daily at http://www.DevShed.com/ This article is Copyright 2001 by Developer Shed, Inc. All rights reserved. Reproduced with permission. |