Giuseppe Maxia is a QA developer in MySQL community team.
A system analyst with 20 years of IT experience, he has worked as a database consultant and designer for several years.
He is a frequent speaker at open source events and he's the author of many articles.
He lives in Sardinia (Italy).
When users submit a report to the MySQL bug system, many of them believe that they are going to get in direct contact with the developers. Before joining MySQL, every time I submitted a bug report I could not help depicting in my mind the idea of a developer while working at his desk, being alerted by a virtual bell that a bug report had been filed, and then going through the motions of verifying it and replying to it.
That idea is far from the truth. While it may happen that a developer is involved in the bug verification process, that occurs only seldom, although it used to be the case, when MySQL was much smaller. Whoever has worked in the development sector knows that developing is a kind of work that requires constant and continuous concentration. As beautifully explained by Tom DeMarco in Peopleware, once a developer has been distracted from his main task, it takes at least 15 minutes before he's back in the flow. Therefore the concept of developers readily verifying bugs is not sustainable today.
Who is this Valeriy Kravchuk, then? Most of the times, when I submitted a bug, this was the name of the person who verified the bug (or kindly pointed to a flaw in my reasoning, demoting the bug report to a wish, due to my reading the docs not very carefully). The names of Sveta Smirnova, Miguel Solorzano, and Tonci Grgin were also frequently associated to bug comments, but I had no idea of who they were. I suspected they were not bored developers taking a pause from coding, but it was only when I joined the organization that I learned the truth.
The ones dealing with the bug reports are not members of the development team. Not only because developers should not be involved in such tasks as to determine if a bug report is what its author claims, but also because verifying a bug is a different kind of work. It is a "problem solving" task, for which career developers, even the most skilled ones, are not necessarily the best pick.
Within MySQL, bugs verification is a task assigned to the Support Team. And what is it?
The Support Team is the team of skilled problem solvers. They are the ones who deal primarily with the requests of MySQL customers, the ones who (depending on the customer level) have to answer a request within hours, or within minutes. They are not switchboard operators. They are troubleshooters. Real ones. Seriously.
There is no problem too hard for them. Every Support engineer is an expert in one or more fields. Collectively, the Support Team is an expert at everything. They handle every problem. They solve every problem. When they can't solve it, there must be a bug, and then the support team member quickly submits a bug report on behalf of the customer who reported it. I said before that Support engineers are not members of the development team. That doesn't mean they are not developers. They are. All of them have strong programming background, as coding ability is required for tasks such as creating test cases, compiling and testing code snippets, creating an maintaining sophisticated test tools.
Above all, Support engineers are experienced at analyzing a problem and taking steps to get it solved. Since bug verification is very similar to problem solving, there should be no surprise learning that the ones with the problem solving skills are in charge of bugs verification. More than that, actually. Within the Support Team there is a Bugs Verification Group, under the leadership of Miguel Solorzano, with the explicit task of verifying bug reports. Don't be misled by this latest piece of information. Members of the bug verification team are not the only ones dealing with bug reports. All Support engineers are required to do their share of bug verification, and they do it either picking bug reports directly or by being involved in the task when helping a colleague.
If you haven't even done such a thing, and think that verifying a bug report must be an easy job, try it. Go to the MySQL bugs system, choose a random bug that has been already verified and try to reproduce it. Chances are that the bug has been reported for an operating system that you don't use, or involves rebuilding the server on such system, or requires some action with tools that you don't know. Perhaps you could verify the first one. But if you keep digging, you will see that there are bugs for which you have to admit defeat or acknowledge that it would take you hours of hard work before applying the "verified" label. If you feel that you are unable to meet the challenge, then think about the skills of the Support Team member who did it.
Consider this. Nobody knows everything nowadays. The Support Team members are likely to face situations where the reporter is mentioning uncommon operating systems, unusual build directives, unknown tools, cryptic syntax. And they can't plea ignorance and leave it at that. The matter must be solved. Sometimes they manage to convince the reporter to simplify the test case, so that it can be reproduced without the uncommon operating system, or the unknown tool. But many times this this is not a viable course of action, and then they need to learn (quickly!) how to use the unknown tool, install the uncommon operating system, build with the unusual directives, get acquainted with the cryptic syntax, and get the job done.
The team member is not alone, of course. For every difficult case, there are the other experts, ready to contribute to the collective omniscience.
Where are they? Is there a roomful of these egg heads somewhere in Cupertino, CA, and another one in Munich, Germany? Not at all. MySQL being a virtual company, the support team members work from home. They keep in contact to each other using an IRC channel and various VoIP systems. The IRC channel is the equivalent of a corridor in a traditional (the cubicle hive) company. Whenever they need advice, or if they feel like sharing some news, they chat in the IRC #support room, which is by far the most active one in the company server. The support virtual desk must be manned 24 hours a day 7 days a week. There are no closing days, no closing hours for the support team. People work as much as possible within their office hours, which correspond to the sleeping hours of someone else. When Shane from South Africa starts working, he says "good morning" to his colleagues from Europe, and "good afternoon" to the colleague from Australia who was on duty during his sleeping hours.
Internal rules dictate that we should change our nickname in the IRC, to show if I am available. So I will change it to 'giuseppe-lunch' when I am about to break for lunch, or 'giuseppe-meeting' when I am about to start a planned phone call. In the support channel, it is not uncommon to see at the same time someone out for lunch while someone else is out for dinner or breakfast. The good side of a widespread spread is that someone will be awake while I sleep.
The Support Team, this powerful machine that sustains MySQL business, is assigned to the first line of defense against the assault of the bug reports. Why is MySQL using its most valued asset to deal with bugs submitted from the community? Simply put, because bug reporters are paying customers. They pay us, not with money, but with work, and thus they deserve to be treated as paying customers.
MySQL receives roughly 700 bug reports per month, half of which are related to the server, and the rest is spread among connectors, client applications, the cluster, documentation, and less popular items such as the web site, the C API, and the bugs system itself.
Having a stellar technologist on one side of the equation won't ensure that every bug will be verified.
It's like playing chess. If you want to see an exciting game, it's no use paring the world champion against a patzer. You should aim for a game between two experienced masters instead. Members of the Support Team are the best players at the game of bug reporting, and when they meet a poorly written report, the outcome is not quick verification, but rather a long and possibly painful exchange of comments in the bugs system. Writing a good bug report showing a clear repeatable case is not an easy feat. Sometimes, even with a good bug report, verification is not always the outcome. There are several conditions that can prevent a bug test from becoming truly repeatable. Only the very experienced bug reporters achieve the skills necessary to report a bug that can be repeated in any system regardless of external conditions. For the rest of the crowd, we could expect to see the dreaded can't repeat status in our bug report, with an apologetical note explaining that the provided test did not bring the expected result. In this case the reporter should provide more information to the support engineer, who leads a process of refinement to reach a final status (either verified or not a bug, depending on your luck).
Don't get me wrong. This doesn't mean that you have to be a grandmaster to submit a bug report. You just have to apply some effort to do it according to the rules. Even grandmasters started their career from beginners.
There are also cases when more information is necessary to make a better case for a repeatable bug report. For example, when the reporter has a unique working environment, with uncommon hardware or staggering amounts of data, it may be required an exchange of information between reporter and verifier to pinpoint the problem correctly. Thus your bug report will assume the needs feedback status, which is an acknowledgment that the report is somehow serious, but it is not defined with enough detail to show the problem correctly. If you have submitted a bug report and after a while it assumes the needs feedback status, it means that someone is working at your case, but no definite outcome was reached. At this stage, more input from you is required. If you care about the bug that you have reported, this is the moment to insist and to co-operate closely with the Support engineers. If you don't, all the effort spent on the case, by both sides, may have been in vain. If you persist, and provide the missing pieces when requested, the coveted verified label will eventually show up in the bugs system, and then the problem will be mostly out of your hands.
After your bug report has gone through the scrutiny of the Support Team and has been given a priority, it's time to do some work on them. The ball is now in the Engineering Department court, which includes the development and building teams.
The team in charge of the subsystem affected by the bug (e.g. replication, event scheduler, stored routines) will take ownership of the bug, and eventually assign it to a developer, according to the bug's priority, the workload of each developer, and the goals for the current period.
During this phase of the bug processing, there is more evaluation going on. The support team has the skills to evaluate the technical side of the bug and its impact on MySQL products, but they don't have the global vision of the work being carried out in the engineering department. Thus, twice a month, a joint committee composed of Support and Engineering people will meet to evaluate if suspected bugs should be escalated to the higher priority. The material for this meeting is prepared by one of MySQL veterans, Sinisa Milivojevic (the legendary Uncle Sinisa), from the Support Team, and the outcome is summarized by Trudy Pelzer, Engineering Project Manager. The evaluation of this steering committee is vital to keep the development effort into focus. Thanks to this joint work, critical bugs are not overlooked, and less important ones take a chance to get in the loop when some work on related matters take place.
When the evaluation struggle is over, the bugs at the top of the list are assigned to a developer. The ones down the list will wait a bit longer, and those further down (_low_ priority) won't get in the immediate schedule. MySQL aims at fixing all the bugs, and they will be fixed, eventually, but for the lower priority ones there is no set time for their processing.
Once the bug is assigned to a developer, there are some criteria to decide when to start working on it. For critical bugs, fixing must start within two days of the assignment. Lesser bugs have longer waiting times. Notice I said start, not finish. The developer can give an estimate of the work needed to fix a bug, but everybody knows that estimates are difficult beasts to tame, so we won't promise that we'll fix a bug within a given time frame, because we can't be sure of delivering the fix by a certain date. This sounds like an unprofessional way of working, so let me explain what happens next, because, contrary to popular belief, coding a fix is just a tiny part of the development process.
The developers can't promise you that a bug fix will get in the binaries by a given date because they can only tell you when the second step of their involvement will be over. I said the second, not the first, since the developer's steps to get a bug fixed are
Let's take the first step. You may think: "wasn't that done by the Support Team?". No, it wasn't. The Support Team only makes sure that a problem exists, taking into account what the manual says, and checking that the bug is effectively breaking something. Finding why it is broken is beyond their scope. The developers, on the other hand, need to know exactly this, why it is broken, if they want to start fixing the bug. Sometimes the reason is evident from the bug description. An experienced developer with a deep knowledge of the code will figure out where the problem is, but more often the problem is deeply hidden in the code intricacies, and much debugging work is needed to find the exact cause.
Only when the reason is known the developer can start thinking of a solution, and code it. The search for the bug origin can take ten times longer than coding the appropriate patch. This is the first reason why predicting a delivery time for a bug fix is only possible as the second step. For the other five reasons, read on.
After coding a patch, the work has barely started. There is a strict policy of quality assurance dictating that no code can be pushed to a source tree without passing the review of two senior developers. If the reviewer is one of the very senior ones, the second review can be waived, but at least one review is needed.
Therefore, after the patch is completed, it is sent to the first reviewer, while the developer tackles the next task in the waiting list.
If the reviewer says that everything is perfect, then the next step is easy. Push the patch to the tree and wait for the global tests. If the review is negative, though, then the patch must be reinserted in the work list, perhaps after the current task, started during the review, is finished, and then submitted for review once more.
Sometimes, before the task is completed, more tests may be necessary. For particularly complex patches, or for the ones that may require additional tests, outside the regression suite, the developers can ask for additional QA test or review. When this happens, the process can stretch a little more than usual, until the required tasks are performed. If more faults are found during this step, the patch goes back to the developer for further refinement.
When the developer and the reviewers are satisfied that a proper job was done, an even harder response awaits the patch. After review, the patch is pushed to the team source tree, not the main one. Another policy of quality assurance requires that, before a patch is pushed to the main tree, the team tree must build and pass the regression tests in several MySQL supported platforms. If it does, then we have a green tree (see Brian Aker's Pushbuild, Why I am still awake for a visual explanation) and the patch can be pushed to the main tree. If the build or the regression tests fail, we have a red tree, and the developer must find out what went wrong and fix the patch so that it doesn't break what pushbuild is complaining about.
Of course, the task is not over. Pushbuild can only ensure that the team tree is fit. Once the patch has passed this ordeal, the same build and test business is repeated to the main tree, and chances are that something in this patch may conflict with a patch applied by someone else for another reason, and thus break the build or the regression test. If this happens, then back to the second step of the fixing task (coding the patch), with the whole review and push to be repeated.
When the above seemingly endless process is over, the patch is passed to the documentation team, who inserts a short description of the bug and its fix on the changes log and in the relevant parts of the user manual. Where do they take such information? From the patch itself. As part of the bug fix process, the developers and the reviewers must make sure that the patch is properly commented, including the reason for the bug, the steps taken to fix it, and the reasons behind such decision.
The documentation team will take the bare notes from the developers and make beautiful manual entries from that, adding appropriate comments in all the pages where the affected feature is mentioned. If necessary, they will ask the developers for more information.
Is it over? Not yet. Now it's time for the final tests. The build team, when the time to release a new binary comes, will freeze all the accepted patches and build the binaries for all MySQL supported platforms. When all these builds are successfully created and the test suites passed (a whole forest of green trees), we are almost done.
To ensure that nothing was overlooked, a final, manual review of the binaries is undertaken, and only after the successful completion of this latest review the binaries start flowing to the mirror sites with your precious bug fix included.
The current MySQL bug management process is the result of long experience and continuous adjustment. There is always room for improvement and innovation, and we are open to suggestions on this subject. I hope that taking our internal procedures in the open will make the community understand our task better and accept the long fixing times with renewed patience and faith in our commitment to the products quality.
If you have comments, suggestions, criticism, please enter them in the QA forum.
Read and post comments on this article in the MySQL Forums. There are currently 5 comments.