US DHS Testing FOSS Security 203
Stony Stevenson alerts us to a US Department of Homeland Security program in which subcontractors have been examining FOSS source code for security vulnerabilities. InformationWeek.com takes a glass-half-empty approach to reporting the story, saying that for FOSS code on average 1 line in 1000 contains a security bug. From the article: 'A total of 7,826 open source project defects have been fixed through the Homeland Security review, or one every two hours since it was launched in 2006 ...' ZDNet Australia prefers to emphasize those FOSS projects that fixed every reported bug, thus achieving a clean bill of health according to DHS. These include PHP, Perl, Python, Postfix, and Samba.
Re:"The" PHP? (Score:4, Informative)
The Actual Scan Site (Score:2, Informative)
RTFA (Score:5, Informative)
Re:Looking good, too bad the press didn't understa (Score:5, Informative)
Indeed. FTFA:
One can only speculate about the, er, source of their discomfort.... 8^)
1 per 1000 lines is even more impressive as an average across all 180 FOSS applications tested. Most impressive of all are the highlights:
Even some of those with more bugs have at least responded well:
And my favourite 'backslider' of all, OpenVPN, has yet to fix 100% of the bugs found during this exercise. Of course, that's only 1 bug in over 69,000 lines of code....
These results should be viewed as excellent, by and large. This doesn't mean all this software is bug-free, just that there aren't a lot of easily preventable bugs in the code base. Most encouraging, though, is how fast they got addressed and fixed by the healthier FOSS projects.
Comment removed (Score:3, Informative)
Re:Looking good, too bad the press didn't understa (Score:1, Informative)
http://www.subspacefield.org/security/security_concepts.html#tth_sEc24.5 [subspacefield.org]
Pessimism in article (Score:5, Informative)
Not only did the article say much like its commercial counterpart, but most of the numbers it shows are actually good for open source software.
For instance, most of the projects discussed had less than 1 bug for 1000 lines of code. For instance, the Linux kernel had .127 bugs per 1000 lines, and that on over 3 million lines of code.
Also, the article talks about key projects, such as the glibc (which is basically used by everything on a Linux system) that already fixed all the issues.
Even something huge and complex as Firefox has already fixed half of the issues, and is showing progress on the rest of them (by the fact that some were already verified).
Overall, I didn't get the half glass empty tone that the summary is implying. And what I found strange is that even the comments on the site itself, and many of them on /. itself, are also taking the pessimistic view.
I thought that this news are great for open source software. Shows that it has less security issues than average, that the issues are fixed quickly, and still that some programs are certified by a company for use in security related departments such as the DHS. What could be better than that?
Re:What about MS? (Score:3, Informative)
Now, I realise it doesn't change your point at all, but its not like MS is the only entity with access to their own code: they have dedicated programs to share even their most closed pieces of code with their customers (if they're important enough).
Re:Wow... FOSS looks pretty pathetic (Score:5, Informative)
Yes, OSS has bugs. Everything from compilers to content management systems, surely. So do proprietary programs.
The more qualified eyes you get on a bug, the better chance you have of finding and fixing it. You can do that by having a big staff that pores over code again and again. You can do it by having lots of outside help, like in the case of popular OSS projects. One thing that helps is to have a fresh set of eyes look over something, which is much easier in OSS that in closed-source applications.
BusinessWeek had an article from a guy at Coverity back in 2006 about this. In that article [businessweek.com], Ben Chelf said that 4 of the top 15 programs on the quality scale measured by defects per thousand lines of code were OSS. He said that on average, the major-project OSS software they tested was indeed higher quality software than average. He said, though, that the absolute highest quality code was the cream-of-the-crop proprietary, closed source code from places that make things like fly-by-wire systems. Well, yeah. I'd want my airliner's fly-by-wire system completely bug-free, too.
Commercial software tends to harbor anywhere from 1 to 7 bugs per 1000 lines of code according to the National Cybersecurity Partnership's Working Group on the Software Lifecycle [zdnet.com]. Voluntary testing by Coverity requested (and probably paid for) by MySQL AB revealed that project to have all of 97 flaws, one of which could be a serious security issue. All 97 were to be fixed for the next release.
A similar study (same link) found 985 bugs in over 5,700,000 lines in the Linux kernel, or fewer than one bug per 10,000 lines of code. TFA has data on a newer version of the kernel -- 0.127 bugs per TLOC.
In Apache, 22 bugs total, 0.14 per TLOC, and three fixed so far.
PostgreSQL had 0.041 per TLOC, and have so far fixed 53 of the 90 bugs.
The glibc team fixed 83 of 83 bugs found.
OpenVPN had found one security-related bug in over 69,000 lines of code. As of later yesterday, it's officially security bug free according to the same testing people.
The list of officially security-bug free software [zdnet.com.au] includes Amanda, NTP, OpenPAM, OpenVPN, Overdose, Perl, PHP, Postfix, Python, Samba, and TCL.
So with Linux (0.127), glibc (0.000), Apache (0.140), PostgresSQL (0.041), Perl (0.024), PHP (0.000), and Python (0.000) powering a web server (numbers according to Coverity [coverity.com]), you have 0.0474 defects per thousand lines of code across the server. I'd say that's pretty good.
Re:Looking good, too bad the press didn't understa (Score:3, Informative)
An IT Security article on full disclosure [itsecurity.com] states that as early as the middle of the 19th century locksmith Alfred C. Hobbes thought full disclosure was important to clear up the rash of lock picking people were experiencing. It goes on to discuss exactly why full disclosure works so well.
David Wagner says in an article on security: "Today, many security companies are strongly resisting this, and I think they will need to learn to accept and embrace public scrutiny as a natural and necessary part of security systems." [berkeley.edu] -- David Wagner and Ian Goldberg are the ones who cracked the security of the SSL layer in Netscape 4.
IEEE article abstract stating that full source code access can have "real benefits for security" [ieee.org], although that's not automatic and it has to be done correctly.
Bruce Schneier -- yes, THAT Bruce Schneier -- has an article on his blog [schneier.com] that starts "Full disclosure -- the practice of making the details of security vulnerabilities public -- is a damned good idea. Public scrutiny is the only reliable way to improve security, while secrecy only makes us less secure."
Is that enough or do I need to go to the second page of this Google search [google.com]?
BTW, DJB [wikipedia.org] thinks that both full disclosure and isolation of trusted components are absolutely vital. He's the guy who won the right for Americans to export cryptography technology in court against the Department of Justice. He also found a timing attack against OpenSSL's AES cipher and his Unix Security Holes class of 16 students turned up 91 previously unknown holes in one semester.
As for "Security by design", that helps. However, with many programs being written in languages which allow null pointers, stack overflow, buffer overflow, and array overflow the design can be as secure as you want and the program can still be crashed. In some cases arbitrary code can still be executed. Address randomization, NX bits, run-time bounds checking, and automatic memory management can go a long way. Sanitation of inputs, static analysis, time padding, and more still have to be considered in some cases.
The tests Coverity is running are an example of static analysis. If there's a C routine that can be coerced into smashing the stack or overflowing a buffer in the heap, that can often be automatically caught and reported. Memory leaks often can be, too. They're probably also able to do at least rudimentary checks for sanitizing input values.
Re:L, A and P, but where's M? (Score:3, Informative)
MySQL uses Coverity and Klockwork [mysql.com] on their certified versions on several different platforms. The certified versions are based on the major releases of community versions, and are typically just more conservative in that they only make changes for critical and security bugs [livejournal.com].
There's speculation that the community edition tested was actually an old report without a retest even back then, as the certified version based on that community version had zero defects reported and the bug count on the community edition was the same per TLOC as the previous report before those bugs were fixed in both versions.
Re:Wow... FOSS looks pretty pathetic (Score:5, Informative)
I'd say your statistic is wrong. You need to multiply each average by the number of kloc per project (being careful to count those for the project version for which the averages were given), and then divide by the total kloc across all projects.
Re:Looking good, too bad the press didn't understa (Score:5, Informative)
strcpy is NOT insecure. It can be used insecurely.
But congratulations, you've just turned what could have been a borderline ok strcpy(src, dst) (ought to have been criticized at code review as the names of the variables are confusing) bit of code into (probably) a crash and definitely a buffer overrun if sizeof dst is larger than sizeof src.
I have lost count of the number of bugs I've had to fix after someone changed a perfectly good strcpy into strncpy. A common mistake is:
strcpy(dst, src);
becomes
strncpy(dst, src, sizeof dst);
and then you get a bug because only the first four characters of src appear in dst followed by garbage.
Of course, then it gets changed to
strncpy(dst, src, strlen(src));
because the original programmer did know what they were doing and the buffer was big enough.
Eventually we get to the brilliant:
strncpy(dst, src, strlen(src)+1);
Fantastic! What an improvement! And yes, it really does happen in what was once good production code because some idiot has heard that "strcpy is insecure".
Another one I've seen is:
dst = malloc(1000000);
strcpy(dst, "MESSAGE");
gets changed to
dst = malloc(1000000);
strncpy(dst, "MESSAGE", 1000000);
Yup, instead of writing 8 bytes, we'll write one million bytes because strcpy is insecure, but we won't fix the missing check for NULL. (there's a fairly good argument for not checking the return from malloc in much production code - if malloc actually fails then you're already so far up shit creek without a paddle that it's probably impossible to recover gracefully anyway. Obviously different considerations will apply if you're controlling a nuclear power plant than if you're writing a game)
strncpy is NOT a replacement for strcpy with a length parameter. Unfortunately strncpy has a very bad name, it should be called something like meminit_from_str() as strncpy ALWAYS writes n bytes and doesn't always write a null terminator. (I've also had to fix bugs where someone has replaced a correct use of strncpy with a version that guarantees to write the null)
strncat is a possibly safer replacement for strcat. However, the length parameter is so tricky to get right that I've seen cases where someone originally wrote strcat safely, that got changed to strncat "because it's safer" and then a bit later another change was made that caused a crash because the original change to strncat got the length parameter wrong.
extern char error_msg[][40];
char error[64];
strcpy(error, "ERROR");
strcat(error, error_msg[e]);
becomes
strncpy(error, "ERROR:", sizeof error);
strncat(error, error_msg[e], sizeof error - 6);
becomes
strncpy(error, get_translation("ERROR:", lang), sizeof error);
strncat(error, translated_error_msg(e, lang), sizeof error - strlen(error));
of course, even more common is to miss the -6 or strlen(error) completely than to remember the extra -1 that is required on the length parameter.
(The man pages are IMO, confusing for strncat as they usually say something along the lines of "appends at most n characters")
Tim.
Re:False positives (Score:3, Informative)
strcpy, strncpy, strlcpy, etc (Score:2, Informative)
see <http://www.courtesan.com/todd/papers/strlcpy.html>.
Rabid MS hater? (Score:3, Informative)
Where did you pull the 1% of OSS users being programmers from? Your ass? You didn't even cite your own ass? How rude!
Yeah, there aren't enough world-class programmers to go around the millions of OSS projects out there, or even the most popular hundred thousand of them. Maybe not the ten thousand most popular. Yet over half the patches for the Linux kernel come from people other than the core development team.
In fact, the top submitter of changesets into Linux 2.6.20 only accounted for 4.8% of them. The top 20 contributors accounted for 28% of changesets. Similar numbers pop up by number of lines added. Linus only personally signed off on 13% of the changes in 2.6.20 so there's a good spread there, too.
The people developing the Linux kernel aren't just weekend coders in their parents' basements. Red Hat, IBM, Novell, Intel, Oracle, Google, University of Aberdeen, HP, Nokia, SGI, Astaro, MIPS Technologies, MontaVista, and Broadcom were among the top 20 sources of changesets. Stats of 7.7% for "no employer" and 25% for "unknown" appear, along with a few lesser-known companies. Add Sony to the list of employers of contributors by lines of code. Put Freescale in the list for the versions in the year in which versions from 2.6.16 to 2.6.20 were developed.
In all, 65% of the changes to the Linux kernel for version 2.6.20 was from corporate development. Over 1,900 people had patches make it into the 2.6.20 version of the kernel alone.
All these statistics on who develops Linux can be found at LWN.net's article called "Who Wrote 2.6.20?" [lwn.net].
How many companies write and vet the code at Microsoft? Yes, I'm sure there are a bunch of dedicated people at Microsoft, and they do a pretty good job at making a usable OS. They're getting better about security. It's my opinion that Vista's kind of a mess particularly because they're having trouble designing for both usability and security from the ground up. They'll improve on that, too. I don't hate Microsoft's developers (maybe their marketing and legal departments
However, the biggest OSS projects really do have a lot of people who are highly skilled professional programmers writing their code. They also have an advantage of being able to attack issues most important to their varied employers using skills and development methods different from those at other corporate contributors.
It's not a black or white issue. Microsoft's got pros and cons, and so does their software. OSS has pros and cons. I have two PCs at this desk. One's XP Pro and one's Linux. I use both every day I'm in the office. I also use Linux servers and I have a Mac at another desk. At home I have XP, Linux, Solaris, Mac, and OS/2 (the OS/2 is for fun). My wife's PC has XP on it, but she can use the Linux box when she needs to. She's not an admin level user, but she can fix some issues on Windows just from having used it so much for so long.
To bash MS when they really screw something up isn't to be a "rabid MS hater". To praise them when they do something well isn't to be an MS fan. The same's true of OSS projects. Most people want their software to meet their needs and don't root for one "team" or another. Most people who do prefer a particular project are still willing to give other projects their due respect. There are very vocal fanatics in every camp, but just because they're loud and quicker to spout doesn't mean they're actually that numerous.