Unicode Encoding Flaw Widespread 184
LordNikon writes "According to this CERT advisory: 'Full-width and half-width encoding is a technique for encoding Unicode characters. Various HTTP content scanning systems fail to properly scan full-width/half-width Unicode encoded HTTP traffic. By sending specially-crafted HTTP traffic to a vulnerable content scanning system, an attacker may be able to bypass that content scanning system.' A proof of concept affecting IIS is already being posted to security mailing lists. Cisco IPS and other IDS products are also affected." The CERT advisory lists 93 systems, with 6 reported as vulnerable (including 3com, Cisco, and Snort), 5 known not vulnerable (including Apple and HP), and the rest unknown.
Limited impact. (Score:3, Informative)
Content scanning is mostly useful against filtering known exploits, and is hardly meant to be your primary defense. Being able to bypass this scanning won't buy you much. If the content scanner is aware of an exploit it scans for, chances are so are the systems being targeted and are patched to protect against it.
Re: (Score:2)
I don't see anything new here, just another trick to look for. Most well tested systems should not be affected.
Unless I'm overlooking something? I'm not am I?
Re:Limited impact. (Score:4, Insightful)
Your program, sitting below the layer performing the unicode translations, doesn't need to do anything differently from before, as it doesn't matter which of the two methods were used. If you _relied on_ the layers above you to strip out, reject, escape, or whatever, quote characters, then you're writing teabag code, and should get a job selling flowers instead, as software engineering is beyond you.
Always validate user input to your own specification. Never rely on something external to do it.
This exploit hasn't changed the rules one little bit, it's just highlighted the fact that some idiots don't follow them.
Re: (Score:3, Insightful)
Re: (Score:2, Interesting)
Re: (Score:3, Insightful)
So, perhaps if data was all stored and represented in UTF-8, for example, this
Re: (Score:2)
I recommend checking out the Wikipedia article Comparison of Unicode encodings [wikipedia.org].
So, perhaps if data was all stored and represented in UTF-8, for example, this wouldn't be a problem?
You can't impose this on the whole world; a lot of the protocols and file formats that we use eve
I don't think you know what you're talking about.. (Score:2)
They've been trying to sell this kind of kit to us for years.
Re: (Score:2)
Unlimited impact. (Score:2)
Re: (Score:2)
Re:Limited impact. (Score:5, Informative)
The NT kernel provides a lot of facilities that are very useful for writing secure code. I often wonder if the application developers at Microsoft ever noticed that they weren't writing code on top of DOS anymore...
Re: (Score:2)
I've never seen this exposed to the UI though, so I've no idea how you'd go about doing it
IIRC, in Windows XP, View -> Folder options -> untick "Use simple file sharing (recommended)" will let you see and edit an object's permissions though its properties dialogue.
In Vista's this is now enabled by default, which I suppose is inevitable since MS are making permissions so much more visible with UAC and such; but I do wonder how many people will go randomly clicking around to see what it does, click through the UAC dialogue, and end up doing something like removing permission to access th
Re: (Score:2, Informative)
Basically the "Security tab" you see for files could be applied to individual ports.
Re: (Score:2)
Re:Limited impact. (Score:5, Informative)
Using the Native (NT Executive) API you can read or set the ACL on any object in the namespace, assuming you have the appropriate user rights and you own the object (or the ACL allows you to modify the permissions). NT kernel objects can also be case-sensitive (though that can confuse some Win32 programs). Often, you can delete, move, etc files that are locked by the Win32 subsystem, which can be useful in certain situations (though in Vista they made the IO system capable of cancelling outstanding IOs on its own so the zombie process bug that ends up locking files doesn't happen anymore. Its unfortunate Vista is so DRM-laden, or I'd try upgrading.)
The APIs are NtQuerySecurityObject and NtSetSecurityObject and I believe the devices are in \Device\Tcp, \Device\Ip, \Device\RawIp, \Device\Udp, etc. Check out http://undocumented.ntinternals.net/ [ntinternals.net] for more details on what is in the native API (ntdll). This API provides everything necessary to implement a full POSIX layer, which is exactly what Services for Unix does, installing itself as a new runtime subsystem right next to the Win32 subsystem. (With Server 2003 R2 SP2 they shipped it as an available component as part of the install; I've even got setuid support and GCC installed as part of the package.)
MOD PARENT UP (Score:2)
Re: (Score:2)
Actually, the IO system has always been able to cancel IO operations, including by terminating the thread owning the operation. However, IO can only be canceled when the drivers owning the operation allow it to be, and Vista got rid of many of the places IO could block but couldn't be canceled in the standard drivers. MUP (which does UNC network host lookups) in particular.
I had the same idea about reaching the ACLs of objects w
Re: (Score:2)
I've spent some time implementing a security descriptor editor [dyndns.org] designed to expose ALL object
Re: (Score:3, Insightful)
At any rate, running in a chroot jail is arguably better in some way
Re: (Score:2)
Incident response (Score:4, Interesting)
"Not vunerable" (Score:3, Informative)
Re: (Score:2)
Trying to parse encapsulated data is a bad idea generally; as is trying to detect the same attack twice. Of course, unless you're snakeoil^Wsecurity software salesman.
bypassing great firewall? (Score:2, Interesting)
Nothing to see, move along ... (Score:5, Insightful)
It is a self-inflicted misbehaviour as in common sense.
It is like those silly Cisco content inspectors on port 25, that try to avoid attacks on flimsy MTAs.
It is like someone dying from a jab against measles: the jab protected that person from contracting measles, actually.
It is like those stupid anti-virus programs that are more vulnerable than the daemons they profess to protect.
When the attacker uses a codepage different from the one that you think she ought to use, she can circumvent your content filter. Which ought not be an attack vector, in any case.
As I said: nothing to see, move along
flawed design .. (Score:2)
Stored procedure cross-compatibility? (Score:2)
Re: (Score:2)
I don't know what you mean by standard manner. According to this [postgresql.org] PostgreSQL uses something called procedural languages. But then again since when was SQL ever implimented in a common standard. Remember when Microsoft 'extended' SQL so as to allow spaces in table names, you only have to wrap the name in square brackets [] or back-ticks ``.
But my point
Another likely example of OSS? (Score:2)
It's annoying to me when people
Re: (Score:2)
If X works in system Y doesn't imply it works in system Z. Heck, the reason it works in Y could be because of a bug in Y.
TCP/IP code from BSD .. (Score:2)
I assume you are referring to the ping of death [archive.org]. The root cause being a bug in the TCP protocol and occured on other platforms not using the BSD code.
was Another likely example of OSS?
Re: (Score:3, Informative)
Apparently, Vista's networking stack has been rewritten from scratch -- which does make you wonder how much of the reason for that was technical, and how much was MS wanting to be seen to get rid of all the BSD/*nix code in Windows in preparation for their patent offensive...
Why should using BSD code come in the way of their patent offensive?
Using BSD code isn't infringing on BSD's or someone else's patent.
IIS's fault (Score:2)
I guess
Re: (Score:2)
"Full width" vs. "Half width" (or, as I prefer, "half-wit") characters exist for typographical convenience in rendering Japanese characters. (Take a look at the Unicode spec, section 10.3 for example http://www.unicode.org/book/ch10.pdf/ [unicode.org]). This does not, however, explain why certain symbols that are already defined in other parts of the Unicode standard, such as the less-than symbol (or left angle bracket) are duplicated there. I suspect that it has something to do with possible confusions that might arise
Re: (Score:3, Insightful)
Shouldn't that be x3C?
I thought about this a little more, and I think the difference will be in what it is used for. In HTML, the "<" glyph has a special meaning, so it makes sense that a different version (in
Re: (Score:2)
Er...yes, of course. Apparently x8B is one of those European-style single quotes (at least that's what I think the purpose of that character is) that looks like a small left angle bracket. (There's a double version as well.)
That's what I get for posting from work, where I have to keep looking over my shoulder watching for my boss, who doesn't understand that posting to /. is research.
Re: (Score:3, Informative)
This is all pointless now with proportionally-spaced fonts (and multiple fonts, you could easily select the "wide" font to print those characters instead). However Unicode had as a design requirement that translating from any common encod
Don't Steal my WoW account! (Score:2)
Why steal someone's real identity when you can steal their uber virtual Undead Priest identity and sell it for 16 bucks.
US-CERT != CERT (Score:2)
Is there a better alternative to CERT now because it just isn't cutting it. I am familiar with Bugtraq and Security Focus. By the time CERT mentions somet
half-wit encoding? (Score:2)
That comes as a complete surprise to me, and I thought I knew at least a little about Unicode and other character encoding schemes. The usual methods of encoding Unicode character points are UTF-8 (variable-length scheme where characters may be represented by anything from one to six bytes), UTF-16 (fixed-width double byte encoding), UTF-32 (fixed-length 4 byte encoding), and well there's UTF-7 and other oddballs. But the cl
Re: (Score:2, Insightful)
UTF-16 is a variable-width encoding. Code points from plane 0 are encoded in 16 bits and code points from planes 1 through 16 are encoded as two 16 bit surrogates. Many developers, like you, aren't aware of this, so it's very common for software to choke on UTF-16 with surrogate pairs.
scenario:
1) You escape a Unicode string that contains fullwidth characters. The fullwidth characters
Re:Send your claim in now (Score:5, Funny)
Oh, you said public.. hehe, forget I said anything.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2, Insightful)
Re:Smelly foreigners (Score:5, Funny)
Re: (Score:2)
You should amend your quote to "you can represent English in 7-bits... just so long as you're willing to use more than 7-bits to do it."
Re: (Score:2)
you can get down to 6 bits per character if you are prepared to do away with either most punctuation or mixed case.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Funny)
è is sometimes used to indicate that the e in a past participle is pronounced, eg learnèd (rhymes with Bernard) as opposed to learned (rhymes with burned).
The umlaut in naïve is retained to indicate that it doesn't rhyme with glaive.
Yes, that's why I'm post
Re: (Score:2)
Re: (Score:3, Interesting)
Depends on alphabet size (Score:3)
Re: (Score:2)
Chinese ideographs are so numer
Re: (Score:3, Informative)
Re:Depends on alphabet size (Score:5, Interesting)
It is likely that the introduction of the printing press (and later mass media like TV/radio and computers) have "arrested" this natural evolution. It may also be possible that the development of a national identity and cohesive society tends to put the brakes on some developments as well - if a single unified language is mandated by culture or a central authority then local variations are much less important.
Romanji (and to a certain extent English itself) is definitely influencing the Japanese; the younger generations even moreso. Japan may end up using an alphabet for day to day needs almost exclusively within the next 100 years. The situation in China is much less clear but it will probably happen eventually.
If we look into the past, nearly all societies with ideographic/logographic writing systems eventually moved to an alphabetic system. Hell, even Ancient Egyptian Hieroglyphs were partially syllabic much like Katakana. Much as previous posters have pointed out, changing to an alphabetic system from Chinese-characters has allowed Korea to dramatically raise literacy rates. There is only so much time for schooling and memorization, and only so much effort to expend on literacy. If a simpler writing system is more accessible then that is a net gain, even if there are a few things that logographic writing systems do better than alphabetic ones.
Re: (Score:3, Informative)
Re: (Score:3, Interesting)
I'm a Chinese but I have never heard of this. Would you be so kind to educate me on this...? Where did you hear such things?
I'm serious.
Re: (Score:2)
Besides, because of the vast number of homonyms in Chinese, an ideographic writing system makes discerning intende
Re: (Score:2)
Well, they weren't feasible using English in its normal written form... so I'd guess they wouldn't be feasible using Chinese in it's normal written form either.
Offhand I can't think of any human script or language that's fundamentally suitable to telegraphy. Which isn't really all that surprising.
Re:Smelly foreigners (Score:4, Interesting)
The practical result of this is that English is normally encoded as a long sequence of 0-25 values (a-z), whereas Chinese would be encoded as a shorter sequence of 0-~100,000 values (Wikipedia reports Chinese dictionaries with 85,000 characters). Naturally, there would be fewer Chinese characters required for a message as each character corresponds to an entire word.
I guess that since morse code is rather like binary and English letters can be encoded using 5 bits, Chinese morse codes would need to be... about 20 bits long? It's late at night, brain not work so good. It seems to me that morse codes using 20 dots/dashes would be extremely difficult to learn; but on the other hand it shouldn't be any more difficult than learning Chinese characters in the first place.
I wouldn't be surprised if English morse codes were more robust against poor data, siny Englxsh is stvll reahible even if sew2eral cheracter; are wrong.
Disclaimer: I don't know anything about the subject, I'm talking out of my elbow for the sake of discussion.
Re: (Score:3, Informative)
Actually, this is pretty much a myth that originated from people with very little knowledge of Chinese language and writing. In all the Chinese languages ("dialects";-), most of the vocabulary is two-syllable words, a
Re: (Score:3, Informative)
The answer would seem to be -- sort of ... maybe. See http://www.njstar.com/tools/telecode/jim-reeds-ct c .htm [njstar.com].
Summary: For telegraphy, Chinese characters are assigned numeiic codes in radical-stroke count order. That's the way that Japanese, and -- I assume -- Chinese, dictionaries, are arranged.
It may seem inefficient to use 20 bits (sort of) to encode
Re: (Score:3, Informative)
Actually, classical Chinese numbers are only slightly worse than Arabic notation (which apparently developed in India but was spread by Arab traders who knew a good accounting system when they saw it). The Chinese notation was far better than any of the Western number notations that the Arabic notation suppl
Re: (Score:2)
My guess is that morse code would have evolved to be the same way that ASL simplifies language considerably. Each sequence would represent a different idea, or character, but every idea could pretty much be conveyed with a
Re:Not a surprise... (Score:5, Insightful)
Re: (Score:2)
I take issue with the fact that they implemented it so poorly.
1. It is impossible to determine if a character is whitespace; you have to look it up in a table.
2. It is impossible to determine if the character is even printable; you have to look it up in a table.
3. It is impossible to determine if the character has another, more canonical presentation; you have to look it up in table.
That's a lot of
Re:Not a surprise... (Score:4, Insightful)
The idea of double-width characters is broken too, yeah, and they are there only to appease the users of some broken Chinese/Japanese software -- but there's nothing wrong with having strange characters in file names. They don't match any file they are not supposed to unless you try to shoehorn them into a limited character set.
So, it's a flaw in the software, not Unicode by itself.
Re: (Score:2)
Re: (Score:2)
Tip of the day: Source code is like shit, everyone else's stinks.
After two decades as a software developer (plus another as an amature) I can tell you that 99% of the time both your design and implementation will be constrained by an existing code base. The whole thing is recursive: if your "bleeding edge" project becomes "leading edge", it will end up as a legacy system that will in it's turn crush the ambition of a n
Re: (Score:2)
"All characters"? I'm afraid that's only 1/17 of Unicode. And according to the law of mainland China, software which doesn't support codes over 16 bits can't be sold there -- well, the commies are nothing but lawful so it's mostly a paper requirement, but it's there.
And UTF-16 has all the fl
Re: (Score:2)
Microsoft is not the only
Re:Not a surprise... (Score:5, Insightful)
Down below this post, there's a troll writing something like 'lol if u cant just use ASCII u shud let ur language die u foreign creeps lol k thx'.
And a whole bunch of people then jump on the troll and criticize him for his US-centrism, and so on, and the troll is at -1.
Yet the post I'm replying to, which is at +4, really comes to the same thing as this troll; it's simply UNIX 8-bit centric rather than USA ASCII centric.
The fact is, computers are used for text, and much if not most text is non-ASCII. How would you rather represent that text:
--With Unicode
--With KOI-8, KOI-8R, KOI-8RU, EBCDIC, EUC-KR, EUC-JP, shift-JIS, Shift-JIS-the-Jphone-version, ISCII, VISCII, ISO-2022-*, and the many many other encodings [hwacha.net] that have evolved in different times and environments.
Seriously, which is going to be easier to secure (and otherwise manage) -- one encoding (which is HEAVILY documented and discussed) or a large number of encodings (the actual number being ever-changing and impossible to really know) many of which are not well documented and have forgotten ramifications and assumptions?
Right -- so now you know why people use Unicode so much.
But the interesting question is, why is one error ("All teh world is teh USA lol! Shouldn't you learn to speak English?") rightly jumped on and pounded flat, whereas another form that's actually more problematic ("All teh world is C on UNIX lolz!! Shouldn't you stop wanting dangerous extra features?") isn't?
Actually, I see in another window that some people have indeed been pounding the parent poster flat, so perhaps my question isn't valid after all.
Re: (Score:2)
Re: (Score:2)
The problem with unicode is that you assume people can decode all your data, but they actually can't. With small encodings people either have it installed or not. With unicode you have it, but it doesn't actually work for 99% of the symbols, because there are no complete fonts.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Um... Why ? Why is filtering command sequences made from 32-bit characters inherently any more difficult than filtering 7-bit characters ? It doesn't make any sense to me.
Re: (Score:2)
Of course, if the normalizer is completely correct, then the problem does go away. Becasue of the complexity of Uncode, this is at the moment very hard to impossibl
Re: (Score:2)
Any time a standard has been changed, you will have some outdated, but perfectly correct software. Hence, two pieces of software may not agree on the meaning of a Unicode string even without a software error.
Re: (Score:2)
Actually, the normalisation functions are defined to be unaffected by future changes.
Re: (Score:2)
Why would they stop working? As two examples, the bash shell and the Perl language don't assign any special meaning to any character with a code above 0x80, so Unicode using UTF8 encoding would be completely
I don't know Japanese law, so why support kanji? (Score:2)
Re: (Score:2)
Re: (Score:3, Informative)
The problem is using character sets that can represent huge amounts of different characters, and among them characters that have similar looking glyphs. That is at the same time a feature that people really really want.
So spam filters will have a problem. They filter out "Viagra" but they don't filter out sequences of letters that look the same. Well, tough. If you follow the rule not to follow any links in emails but type them in yourself, that gets you mostly ar
Re: (Score:2)
And how do you propose to not have dangerous commands when you actually need them in places, just not comming in from that particular channel? Remove, e.g., "drop table" fr
Re: (Score:2)
Re: (Score:2, Funny)
Re: (Score:2)
d-link doesn't do content filtering; at least not in your home.
Fedora, is probably the same as other Debian/GNU/BSDs; depending on the applications performing the filtering.
I fail to see the usefulness of the list of platforms mixed with trade names here.
Am I the only one ?
Re: (Score:2)
Must take a while to figure out, thought I don't know how much software HP have made, but I guess many companies run small inhouse projects maybe written by someone as their ex-job or whatever.
Re: (Score:2)
Don't want to quarrel with you, despite being on
Re: (Score:2, Funny)
5) You are an asshole
Re:Hmmmm.... (Score:5, Interesting)
2)there's is nearly two billion chinese and Indians, who can't use your encoding.
3)I get just as much spam from US companies as I do foreign ones
Re: (Score:2)
Re: (Score:2)
You only need two things.
1) Remember to call setlocale.
2) Use wide character wrappers around all system functions that don't already have one, e.g. wopen, wrealpath, etc.. Never ever directly use narrow character strings for anything.
Re: (Score:2)
On linux (any unix really) you want to avoid wchars and wide functions like the plague.
The way to go for i18n is using utf-8 and bytes for character strings everywhere. (look into the gtk+ library for examples of this)
The whole wchar experiment has been declared a failure, and is deprecated for any usage really.