Interview with Andrzej Wasylkowski from the checkmycode project.
Hi Andrzej, thanks for the interview.
Hi! From my side, I would also like to thank for the interview. It is a real pleasure to be your virtual guest :)
1. I see on the web page for http://www.checkmycode.org/ I enter into a form my C code and it generates a report of anomalies found in my code with an explanation of why parts of my code are considered anomalous and therefore possibly buggy. Walk me through what goes on in the background to produce this?
In a nutshell, we have mined all of the Gentoo Linux distribution for typical usage of component interfaces -- that is, how Linux components are normally used. If you use an interface in an uncommon way, this will be flagged as an anomaly.
From a high-level point of view, there are three main steps involved. First, the code you submit gets parsed and so-called "sequential constraints" are generated from it. These are two-element sequences of function calls annotated with data flow information, such as "retval of socket() -> 1st arg of listen()". They represent, in an abstract way, how your code uses functions to operate on "objects".
Second, we look at all possibly relevant C projects from a Gentoo Linux distribution to see how these projects use the functions that your code uses, too. Without going into too much detail, if you call "socket()", code from all projects that also calls "socket()" is going to be taken into consideration to find sequential constraints where "socket()" is present.
Third, we check if your code violates any of the patterns found in the second step (you can find sample violations in the tutorial. Any violations found will be reported to you by the website interface.
In reality there is a lot more going on, but the high level picture is as described above.
2. When did the project get started and why?
Today, we have several highly sophisticated techniques for checking code. What's missing is the specification to check against. So we wanted to learn these specifications from existing bodies of code. The project grew gradually and it is hard to pinpoint exact starting date. We started with a lightweight parser that was written by my student, Natalie Gruska, as part of her Bachelor's thesis. The parser was finished in July 2009, but the original intention had nothing to do with analysing lots of source code. We just wanted to create a language-independent front-end for one of my tools, JADET. It turned out that the parser was very fast, and shortly afterwards Prof. Andreas Zeller came up with the idea of analysing lots of source code with its help. The remaining several months until creating a web service was a lot of hard work on my part to actually make it all work and scale to the size of a Linux distribution.
3. Who are some of the other people involved?
The parser that is used by the website was written by Natalie Gruska, who is currently a student at Queen's University in Canada. The original idea comes from my supervisor, Prof. Andreas Zeller. The web interface and the web programming was done by a colleague of mine, Kevin Streit, who is, like me, a PhD student at Saarland University in Germany.
4. How is Gentoo involved in the project?
All the source code that the analysis uses to find patterns comes from the Gentoo distribution (i.e., the snippet you submit gets compared to the source code of projects coming from the Gentoo distribution).
5. Why was it chosen?
It gives us access to source code for all the projects in the distribution. This in turn allows us to use our lightweight parser to find the way functions are being used by those projects.
6. What are the hardest parts of using Gentoo?
There are none :) Using Gentoo was a piece of cake, really, and by far the easiest part of the whole project.
7. What would you change in Gentoo to make it easier to use for your project?
One thing that I wish I had access to, but did not have (or maybe simply could not find it) was a web interface to the source code trees of all the projects. Whenever a violation is found, the user also gets three examples of where the "correct" code can be found. We provide a web interface for this, but before there was www.checkmycode.org, I had to manually extract projects' files and it was quite tedious.
8. What do you like about Gentoo?
I like the fact that portage is quite easy to use, and that Gentoo uses a rolling release approach. Anyone who has ever used a non-rolling release Linux distribution and bumped into unsolvable version conflicts while trying to use the newest available version of some package knows what I am talking about.
Also, for obvious reasons, I like the fact that I have access to the source code :) As a matter of fact, the machine that hosts the website runs on Gentoo.
9. Thanks again for taking the time to discuss your checkmycode tool. Do you have any further remarks?
Thank you for asking me all those questions; it was a real pleasure! I would just like to point out that www.checkmycode.org is just a small interface to a tool that is able to handle whole programs of quite large sizes and detect violations in them. Therefore, we put a lot of stress on making the tool able to filter what it thinks are false alarms and this significantly reduces the number of violations found. Unfortunately, the side-effect is that some real errors related to incorrect functions' usage will not be detected. So, to paraphrase Edsger W. Dijkstra, the tool can only show the presence, not the absence of potential problem locations in your code.
|Copyright 2001-2013 Gentoo Foundation, Inc. Questions, Comments? Contact us.|