About the author

Giuseppe Maxia is a QA developer in MySQL community team.

A system analyst with 20 years of IT experience, he has worked as a database consultant and designer for several years.

He is a frequent speaker at open source events and he's the author of many articles.

He lives in Sardinia (Italy).

We are always looking for interesting articles about MySQL! Have you written something and would like to it published here? Please contact us via feedback form »

How to write a successful patch

Overview

Submitting patches to Open Source code doesn't come naturally to everyone. It is not easy, since it requires coding skills that the average user does not have. Moreover, code patches require a fair amount of additional documentation, without which the patch itself is virtually useless. These notes will walk you through the requirements of a good patch for MySQL server and perhaps other projects.

Patch requirements

Let's define some of the basics: A patch is a structured, computer-generated description of how to modify the source code of a software product. The most common patches fix bugs and introduce new features, and the most common program to generate such a description is "diff".

A patch does not exist in a vacuum. It must be integrated with the rest of the application, and thus it must not break existing code, and the result must pass all regression tests. But most important of all, a patch must be reviewed and approved by senior developers of the product before it is applied to a source tree. Then it must pass specific tests, to prove that the patch works as intended and that it does not clash with the rest of the application.

From the above requirements, let's consider two specific tasks and you should ask yourself what would you do if you were in charge of each:

To examine the code and approve it, you must know the purpose of the changes. If a patch does not include some description of such a purpose and the means to achieve it, the reviewer must try to determine the principles by translating from source code to plain language. You must admit that it does not sound like an easy or pleasant task.

Then you need to test the feature. Ask yourself, "Test it against what? What does this patch suppose to do?" Imagine a patch to implement a very simple function that accepts two values and returns their sum. In order to test it, you must know at least: The type of accepted values (integer, floating point, signed, unsigned), the lower and upper boundaries, the result type, the maximum value of the result, how the function should behave when casting values from different types, how should it behave with NULLs, if it accepts only literal values, or also variables and expressions ... one could go on, and this is for a simple feature. Testing a feature without some detailed specifications is not just difficult, it could be impossible.

What a good patch should have

Now you know of some of the difficulties of those who accept patches. In order for us to accept patches, we think that it's reasonable to require:

A short general description (also known as High-Level Description), which tells the purpose of the bug fix or feature in plain language, understandable by laymen and managers. This is something like a title. It should be easy to look through a list of hundreds of such titles and find your patch if we remember what it does.
A High-Level Specification, describing in detail what the feature should do. This is more specific than a title, but doesn't contain code. It's a short paragraph that lets the reader know the goals of the change without drowning in code minutiae.
A Low Level Design, which is a description of what the patch is going to do, with the technical detail of the implementation. Very specific, perhaps using pseudocode.
The code itself, submitted as a context or unified diff file.
Test files pluggable to our regression test suite or a diff against existing files.

If you have been following my explanation so far, the reasons for all these items should be clear. Even so, one can imagine some readers grumbling and muttering, "You want me to submit what?"

So let's review the reasoning. Suppose you submit only code, saying "It's a patch to implement function XYZ, which everybody agrees is going to be useful." Let's say that a senior developer assigned to review your patch agrees in principle that function XYZ is a great feature indeed. However, which attributes of the SQL standards is your function XYZ going to implement? Does it integrate function JKL, which currently implements a subset of the standards, or does it replace it altogether? How will your patch affect existing routines? Is it using existing structures or creating new ones? If you are creating new ones, is there a justifiable reason?

Given these questions, you understand that a few lines of documentation can save a bunch of time to the reviewer; reviewer time is extremely valuable, because only a minority of our coders are entitled to perform a review. If the review is going to take too much of their time, for which the company pays, your patch is not going to be evaluated at all. Right -- not delayed, but skipped in favor of perhaps less interesting but better documented patches.

The same goes for testing. The reviewer wants to see some testing to back your claim that the patch fits its purpose. Without testing, the review will be negative, and the patch is left behind, until someone rescues it by writing the missing pieces.

Then, if you think that these requirements are just a sadistic idea to make you work more, think again. What we ask for is essential to ensure the quality of our products.

What an acceptable patch must absolutely have

We can be a little more flexible, and accept patches that have less than everything we need, but not much less. So, let's make a reasonable list of what would make a patch that busy developers would grudgingly accept and evaluate with some delay.

A high level description of the change;
At least a tentative specification of the change;
The code;
A test case.

We reduced the high-level specs and omitted the low-level ones, not because they are always unnecessary, but because you may not need them if your patch is simple and follows the "coding guidelines":[http://forge.mysql.com/wiki/MySQL_Internals_Coding_Guidelines|MySQL]. However, make no mistake: if your patch does not include all the required items, it may be evaluated, but don't hold your breath. It won't have the highest priority.

If you need further justification for our requests, think about this:

You, as the author of the patch, are the most suitable person to fill in the required information. You have given the matter some thought, and it would be easiest for you to commit your line of thinking to a page of organized docs. Anyone else would take at least twice as long to do the same. As for the test case requirement, we would not consider a patch that has not been tested. If you tested it, it would be easy for you to pack your tests in a script and submit it together with the expected results. If you didn't do it, then ... why would we accept an untested feature?

We hope you understand that we're not trying to impose extraneous work. What we require is just a portion of what MySQL demands of our internal developers. For voluntary contributors we can't be as demanding, but we still mustn't spend too much time on single contributions.

A conciliatory note.

Sometimes, when submitting a patch to fix a bug, the specifications can be reduced to the bare minimum of pointing to the wrong code and saying how to fix it. This has happened in the past, and we are flexible enough to recognize such a case. Apply some judgment, but don't assume anything. What is immediately obvious to you may not be so for somebody else. When in doubt, add full specifications.

Writing specifications the easy way

You don't have to do everything alone. There is much you can borrow from the existing code. First of all, you can ask around, starting from the MySQL internals mailing list, to find out if somebody has already written the specification you are after. If you are planning to write a standard function, chances are that the specifications are already written down, and you need just to cut-and-paste the main part and add something of your own. If it is a feature that many people requested, there may be already a high-level description or even a specification for it, so if you volunteer to do that, MySQLers will gladly provide the missing docs. If all the above fails, remember that writing specifications is not rocket science -- look at the appendix for a clear explanation of what is expected.

If you want to run the extra mile

Assuming that you have done everything by the rules, the only thing that's missing is that part of the documentation that goes to the user manual. Normally, for features developed in-house, the documentation team takes works from the High Level Specification to produce the final docs.

For patches coming from the community, though, it would be really great to have the documentation drafted by the feature author. Therefore, for a really perfect job, feel free to add the full documentation for your creation. If you are not comfortable with writing, you may set up a wiki page on MySQL Forge with a stub of the docs, and invite other contributors to fill in the gaps.

Appendix:

Definitions and explanations (from MySQL AB internal guidelines)

Thanks to Trudy Pelzer for writing it and for giving us permission to make it public.

High-Level Description

The High-Level Description should contain a brief (fewer than 250 words) description of a feature. The purpose of this description is to give the company an understanding of the change you want to make to a product, along with the rationale for making it. Use clear, fairly non-technical language; think of this description as something that could be put in a marketing brochure or executive summary.

Although the H-L-D should be brief, don't make it too brief. Your description should be more than a mere repetition of the title -- it should state the problem you expect the feature to solve. If you want your idea to be implemented, provide the rationale for doing so: Mention any relevant bugs, "Worklog":[http://forge.mysql.com/worklog/] entries, emails (with quotation or with URL for looking up in the mail archive). If the task is in a common category -- e.g. "Full-Text" -- then mention that category.

High-Level Specification

The High Level Specification begins by restating the problem from the HLD, possibly in more technical detail, then providing a solution.

The H-L-S is a precise requirement specification that describes how the feature will work; it describes the implementation requirements on an abstract level, without binding them to specific code. Thus, it should contain a fairly technical description of the feature that the specification proposes to implement. The purpose of this description is three-fold:

To give all developers (and especially anyone who gets assigned to code and review the implementation) a complete analysis and description of the feature, as well as the implications of implementing the feature.
To provide the Docs team with all the information they need to document the feature.
To provide the QA team with the information they need to write good test cases for the feature.

For example, the specification may include:

A complete description of the problem to be solved. Depending on the situation, this may be a mere restatement of the problem from the HLD or it may require a more in-depth, technical discussion of the problem to be solved.
An overview summary description of the feature.
Complete syntax diagrams for the feature.
Rules that describe how the feature will operate. This may include a pseudo-code specification of internal interfaces.
Examples of the feature in use.
All error codes and messages that might be required when the feature is used incorrectly.
Notes on any related statements; what else might need to be done or amended in the server or a related product because it is affected by this feature.
A list of keywords that will become reserved due to the implementation of the feature.
References; links to any web sites that can provide additional information about the feature, especially when it relates to features implemented by our competitors.
A suggested implementation design that takes into account:
1. A list of algorithms that could be used.
2. A review of all subsystems that will touched and how they will be influenced.
3. An analysis of the flaws in the suggested design.
4. Theoretical considerations used in the suggested implementation.
A list of design and specification limitations. It's as important to list what the definition or design does not include to avoid setting false expectations. So if a decision is made to implement only part of a feature set -- e.g. either because of a time constraint or because the current MySQL code won't allow all parts of a feature to be implemented, then provide this information in the HLS.

Some of the above -- especially the implementation suggestion -- will not be applicable to every feature; check with the internals mailing list if you think this is the case. But every specification should provide as much information as possible as part of the specification. Think of the information you would need to code the feature without having to do any more research on how the feature should work; if the entry provides all of the required information, then the specification is complete.

The Low-Level Design

The Low-Level Design portion of a specification is a technical, low-level description of how the feature will be implemented; it maps the objects of the HLS into particular classes, routines and methods, disclosing the code you will change and how you will change it. Thus, while the design may contain only a minimal description, it should include technical details such as:

A review of all subsystems that will touched and how they will be influenced.
A list of files that will be affected.
A list of classes and routines that will be written.
A list of public structures that will be completed
A list of methods that will be added to classes.
An analysis of the flaws in the suggested design.

The purpose of this section is to provide the Implementer, Architecture Reviewer, and the Code Reviewers sufficient information to approve the design and to write/review the code. The intention is that the developer assigned to implement the feature will write the LLD. Think of it as a memory aid for coding, as well as a description that allows others to (a) decide whether there are major flaws in your design and (b) quickly find and review all areas your code has impacted.

Additional resources

Comments?

Read and post comments on this article in the MySQL Forums. There are currently 2 comments.