New PHP Interpreter Finds XSS, Injection Holes 66
rkrishardy writes "A group of researchers from MIT, Stanford, and Syracuse has developed a new program, named 'Ardilla,' which can analyze PHP code for cross-site scripting (XSS) and SQL injection attack vulnerabilities. (Here is the paper, in PDF, and a table of results from scanning six PHP applications.) Ardilla uses a modified Zend interpreter to analyze the code, trace the data, and determine whether the threat is real or not, significantly decreasing false positives." Unfortunately, license issues prevent the tool in its current form from being released as open source.
holy smokes batman (Score:3, Interesting)
From the results paper: "Part of Ardilla's implementation depends on modifications to the open-source Zend interpreter...made (for a different purpose) by a student while he was an intern at IBM. We have since made many more modifications, but since the original small diffs are owned by IBM, we cannot release either those original modifications or our later work that builds on them...It would be valuable for someone to re-implement the original changes, so that we could release our entire system as we would prefer. "
How would these changes be "re-implemented" - would the code have to be re-engineered, or would a trawl through the original code (patching in changes verbatim) be acceptable? Otherwise, would somebody have to find alternative syntax for implementing the same functionality? Barrel of worms methinks.
not possible (Score:3, Interesting)
I agree that it is possible (but difficult) to identify sql injection vulnerabilities with automated code inspection. I do not think XSS can be identified so easily. In a web app, user-submitted text is added to a database. Then who-knows-what happens to it. Eventually, something based on that text is submitted as output, at which time special characters must be escaped.
The only way to accurately identify XSS in such a scenario is to track the input from the user, into the database, and back out, so that you know the special characters are escaped. That's not something software could accurately do for a general case, without tons of false positives.
Re:Just teach people how to code (Score:2, Interesting)
htmlspecialchars converts < to <, > to >, & to & and " to ", simply because those characters have special meanings in HTML and XML and therefore require to be properly escaped. (strictly speaking, converting " is only required in attributes where the value is between quotes itself, but that's the default behaviour of the function to be more general-purpose).
As you can see, the character encoding of the string is irrelevant here -- assuming it is ASCII-compatible --, since the function only replaces some ASCII sequences by other ASCII sequences. Why the string has an additional argument to handle encoding is beyond me. (to prevent replacements of said characters within grapheme clusters perhaps? Or to handle non ASCII compatible encodings?)
Of course, handling character encoding is a real issue, but a different one. It's fairly trivial, however: you have to transfer your data in the character encoding that you declared your document was in.
Maybe you're actually talking of the issue that user agents will encode data not supported by the character set they're supposed to use as sequences? There are different approach on this issue, but the best way is arguably to ask the user agent to send its data in UTF-8. I don't remember any problem with IE6 for that (sure, it ignores the attribute made for that purpose in forms, but it will send the data in the character encoding of the page).
Re:Just teach people how to code (Score:2, Interesting)
Oh, and by the way, I am a software engineer (finishing up my Master of Science in Software Engineering with a focus on Knowledge and Information Engineering from the University of Michigan's Dearborn campus at the end of the summer and have been asked by the Electrical and Computer Engineering department chair to create new curriculum for the undergraduates in interactive web development, and will be teaching it as well) and I consider myself a PHP developer (amongst other languages) and take offense to that