BlackHat USA 2009; Eduardo Vela Nava (sirdarckcat) and David Lindsay presented a paper entitled “Our Favorite XSS Filters and How to Attack Them”. Very interesting paper, you should definitely take a look at it.
In this paper, besides other things, they presented a very interesting way to bypass XSS filters using Unicode charcters.
XSS filters
Consider the following piece of code:
[php]
<?php
// decode to single-byte
$decoded = utf8_decode($_GET[‚input‘]);
// filter XSS attacks
if (
strpos($decoded, "<") == false // don’t allow tags
&& strpos($decoded, ">") == false
&& strpos($decoded, "’") == false // don’t allow quotes
&& strpos($decoded, ‚"‘) == false
)
{
// safe
echo $_GET[‚input‘];
}
// unsafe
else echo "bad input";
?>
[/php]
http://www.acunetix.com/blog/wp-content/uploads/2009/08/xss_utf8_decode.PNG
This code is using the utf8_decode function to decode the input to single-bytes characters. Later, it will check if the decoded input contains dangerous characters and reject the input if that’s the case. Using this function, utf8_decode is/(used to be) recommended to protect against obfuscated Unicode encoding.
Here is a quote from OWASP’s discussion page about “Testing_for_Cross_site_scripting”;
“
The following PHP functions help mitigate Cross-Site Scripting Vulnerabilities:
…utf8_decode() converts UTF-8 encoding to single byte ASCII characters. Decoding Unicode input prior to filtering it can help you detect attacks that the attacker has obfuscated with Unicode encoding.
…
“
However, in this case, as Eduardo and David showed, utf8_decode is the problem and not the solution. You can bypass the filter with a query string like:
vuln.php?input=%F6%3Cimg+onmouseover=prompt(/xss/)//%F6%3E
I’ve edited the code to show the input before and after utf8_decode to understand what’s going on:
input (before utf8_decode): ö<img acu onmouseover=prompt(400854747531)//ö>
decoded input (after utf8_decode): ?g acu onmouseover=prompt(400854747531)//?
The initial string contained 2 filtered characters < (%3C) and > (%3E). However, because of the %F6 character, utf8_decode is replacing them (and two more characters) with a question sign. The filter is bypassed and the code is vulnerable to XSS (cross site scripting).
utf8_decode and addslashes
However, this problem is not only related with XSS filters. A similar case will appear when using utf8_decode to convert escaped strings (e.g. addslashes()).
Some sample source code:
[php]<?php
…
if( !empty($GET[‚username‘]) & !empty($GET[‚password‘])=
{
$user_sanitized = utf8_decode(addslashes($GET[‚username‘]));
$pass_sanitized = utf8_decode(addslashes($GET[‚password‘]));
$sql = "SELECT * FROM users WHERE uname = ‚" . $user_sanitized . "‘ and pass = ‚" . $pass_sanitized . "’";
$result = mysql_query($sql, $db);
if (!$result)
{
die(‚SQL error: ‚ . mysql_error($db));
exit;
}
}
…
?></p>
<p class="MsoNormal">[/php]
http://www.acunetix.com/blog/wp-content/uploads/2009/08/sql_injection_addslashes_utf8_decode.PNG
This code is using addslashes (which is not a proper way to protect against SQL injection but still people use it) together with utf8_decode. If you try to insert a single quote, addslashes will protect against SQL injection:
index.php?username=%27&password=a
user: test\’
pass: a
SQL query: SELECT * FROM users WHERE uname = ‘test\” and pass = ‘a’
I’ve updated the code to show the inputs and the SQL query. However, this code can be exploited using a query string like:
index.php?username=test%FC%27%27+or+1=1+–+&password=a
This will generate the following output:
user: test?’ or 1=1 –
pass: a
SQL query: SELECT * FROM users WHERE uname = ‘test?’ or 1=1 — ‘ and pass = ‘a’
Again, utf8_decode replaced the characters after %FC with a question mark, making the code vulnerable to SQL injection. The PHP directive magic_quotes_gpc is on by default, and it essentially runs addslashes() on all GET, POST, and COOKIE data.
While looking into this problem, I’ve found a very useful comment on the PHP page for the utf8_decode function:
Warning!
This function contains a possible security risk when you try to convert escaped strings (see addslashes() and related functions).
It reacts nasty on broken multibyte sequences. In UTF-8, follow-up bytes ALWAYS have the binary pattern 10xxxxxx, but this fact is not handled by utf8_decode in the way you would expect: If you pass a start byte (110xxxxx, 1110xxxx, 11110xxx – or even invalid sequences like 11111100), followed by one or more non-multibyte chars (0xxxxxxx), the start sequence “char” will be replaced by ‘?’ (0x3F) and up to three following chars will disappear even if they are single-byte-chars (0xxxxxxx). So if you escape a string with a typical escape char like backslash, you would expect that your escaping would always survive a call to utf8decode because the escape char is in the assumed safe ascii range 0-127, but that is NOT the case!
Try things like utf8_decode(“test: ü\\\”123456″) to check it out.
To avoid problems take care that string-escaping always is the last step of data manipulation when you depend on leak-proof escaping.
This comment explains very well what’s going on. We’ve also updated Acunetix WVS to test for this kind of vulnerabilities in the latest build (build 20090813).
Quelle . http://www.acunetix.com/