Spoofing URLs With Unicode 433
Embedded Geek writes: "Scientific American has an interesting article about how a pair of students at the Technion-Israel Institute of Technology registered "microsoft.com" with Verisign, using the Russian Cyrillic letters "c" and "o". Even though it is a completely different domain, the two display identically (the article uses the term "homograph"). The work was done for a paper in the Communications of the ACM (the paper itself is not online). The article characterizes attacks using this spoof as "scary, if not entirely probable," assuming that a hacker would have to first take over a page at another site. I disagree: sending out a mail message with the URL waiting to be clicked ("Bill Gates will send you ten dollars!") is just one alternate technique. While security problems with Unicode have been noted here before, this might be a new twist."
Our Task is Obvious (Score:4, Funny)
So, what would be the cyrillic for Slashdot.org?
i know you're being funny, but... (Score:5, Interesting)
Re:i know you're being funny, but... (Score:2, Insightful)
Re:i know you're being funny, but... (Score:2)
The spoofable letters are a and o.
old trick (Score:3, Interesting)
networks like RusNet. http://www.irc.net.ru/
Done in DOS a long time ago (Score:4, Interesting)
Re:Done in DOS a long time ago (Score:2)
Re:Done in DOS a long time ago (Score:2)
Re:Done in DOS a long time ago (Score:3)
Re:Done in DOS a long time ago (Score:2)
Yes! I remember making a whole tree of directories like that on my parents' 286 back when I was a kid, to keep my stuff hidden (I very conveniently ignored the fact that my dad had about 15 years of computer experience even then, because I was obviously being so clever). In order to remember where it was myself, I used the digits of pi, so the top directory had 3 Alt-255's, the next one had 1, the next 4, and so on. For a fifth grader I sure was proud of myself--my very own passworded directory, that no one could possibly guess at!
. . .
Sure is a good thing that computer isn't still hanging around . . .
I gave m1cr0s0ft.com my credit card number!!!! (Score:4, Funny)
Re:I gave m1cr0s0ft.com my credit card number!!!! (Score:2, Funny)
Re:I gave m1cr0s0ft.com my credit card number!!!! (Score:2)
Re:I gave m1cr0s0ft.com my credit card number!!!! (Score:2)
Re:I gave m1cr0s0ft.com my credit card number!!!! (Score:4, Insightful)
Whew, good thing you caught it in time! Don't worry, the credit card companies can take care of it, no worries, just enter your name,credit card number, social security number, and mother's maiden name at each of the following URLs:
(Those all use "ell" instead of "eye" when possible.. they look exactly the same with my fonts.. Since there already "homographs" in plain ASCII, and plus Javascript mouseovers can be used to change the browser status area, and plus many people don't even fully understand the difference between "microsoft.com" and "microsoft.evil.com", this Unicode trick is nothing to worry (more) about!)
Workaround (Score:3, Insightful)
Re:Workaround (Score:2)
I took a call from a customer who got Windows with his pc and wanted a refund because he was going to use linux.... (i think he's smart)
While we're talking about my recent install of xfree86 4.2.0 he mentions MS is taking over linux now with MS Linux [mslinux.org] and belives it's real. (i realize he's dumb)
I spent the required time to guide him to the domain whois to show him this page isn't owned by Microsoft. He thought the quotes on the page are real....
sigh, anyway it's not hard to fool people
WHY THIS IS IMPORTANT (Score:5, Informative)
When you pay money, say with paypal.com, you always want to check the URL. Of course someone could have fake link like: "click here to pay with paypal" and then redirect you to their bogus site with the intention of stealing your passwords. But it would be fairly obvious from the location bar in the broswer that the URL was not paypal.com. But if unicode can be used to spoof the location bar then it will rope in even cautious users.
Re:WHY THIS IS IMPORTANT - It's already been done (Score:4, Informative)
Re:WHY THIS IS IMPORTANT (Score:2)
the location box. Can I get Mozilla to do this?
Re:WHY THIS IS IMPORTANT (Score:2)
Ewww.....
I wonder if unicode is even supposed to be allowed in domain names. If not, maybe this will prompt Microsoft, Mozilla, and the like to error when the host/domain contains unicode.
Until then, I guess I'll retype the host/domain of any site I intend to log into before doing so. What a pain.
If unicode is valid, heaven help us. With registrars charging as little as $8 for a domain, you know they aren't checking these things.
You're my hero, AC.
What needs to be done to solve this (Score:3, Insightful)
Solution: Make brovsers default to displaying links to sites with non-ascii address different from regular links
Also since link display mey be overridden by style sheets, either make the browser override stylesheets for these links.
Display a warning when user follows one of these links
If this warning is displayed as a popup, if the user checks the "never show this warning again" display a text that explains why this is a bad idea
The only true way to security is to annoy your users into submission
Re:What needs to be done to solve this (Score:2)
You're treating Unicode URL's as something wrong. They aren't, and a program should not usually annoy users using normal features of the standard.
A much better solution is to detect anomalous URL's (those with mixed scripts) and display them differently in the address box. Since a website can't do anything bad to your computer (and if it can, then that's a serious bug in the browser that needs to be fixed), you don't need to worry about links; just telling whether a site isn't authentic. If your users won't check the address bar, then ASCII links will have no problem spoofing them, either.
The only true way to security is to annoy your users into submission
It tends to make people not want to use your products. It induces random fear and doubt in others. What popping up random boxes for normal events tends not to do, is help users be any safer. It's amazing how fast I can click [yes] on Windows' dialog boxes, before I can even really check what they say.
Re:Right.. excpet.. SSL (Score:4, Insightful)
People now seem to be good at knowing that if you get funny pop ups about self signed certs or certificates not matching the url that they don't put in their credit card number... now suddenly that doesn't apply, because you won't get that, and the differences aren't as obvious as those for something like paypaI.com or micros0ft.com
I would have thought it wasn't a problem except... (Score:4, Informative)
Non-nerds have proven to be extremely difficult to educate on the concept that "what email claims to be is not always what email is, and where it claims to come from is not always where it really came from". During the recent Klez outbreak, I even received a message from a nerd-friend saying that he thought my machine might be infected, because he received an infected message from "me". Of course it was spoofed, because I happen to be in a lot of peoples address books, but since I haven't used Windows on the desktop in over three years, it clearly didn't actually originate with my box.
Folks are just kinda thick about questioning the veracity of claims (hell, astrology still sells books and 900-number phone calls). And this could definitely be used for nasty purposes...and certainly will. Spammers will have a field day with this, because they can't help but seem 'fly by night' because they cannot establish a real brand name due to the disgusting nature of their busines. If they stand still, they'll get lynched. But if they can, even for a short time, hijack a real name that people trust, and offer up a too-good-to-be-true scam under that trusted name...well, you see where I'm going with this.
Of course, everyone here knows that unsolicited "business offers" by email are always scams run by filthy people...but my grandmother doesn't know it, nor do my parents or many of my non-nerd friends for that matter.
Just a thought. We'll see how it plays out, I reckon...
Re: (Score:2, Flamebait)
Re:I would have thought it wasn't a problem except (Score:2)
Yep, you're right. Let's make all the grandmothers stay in their rocking chairs where they belong. The internet is for young, savvy nerds. Knitting is for old people.
Seriously, I understand your perspective, and it isn't as though I'm suggesting legislation or something stupid like that (I'm anti-government on all issues)...I'm just saying I think people will get scammed using this method. And I think it may be damaging to legitimate companies as well. This is unfortunate on two counts...it is bad for my grandmother, and yours, and it is bad for honest businesses who would never use spam marketing or pull some kind of bait-and-switch, or just plain ol' scam.
That's all...I don't have solutions. I'm just griping about the problem. Isn't that what slashdot is for, hand-wringing and griping?
Re: (Score:2)
Re: (Score:2)
Re:I would have thought it wasn't a problem except (Score:2)
Ah, but then you couldn't get the pictures of the cousin's sister's kids emailed every time they get an award at school. Or the forward of the forward of the quoted forward of the latest monster joke to wander the 'net.
Unicode Environments (Score:4, Insightful)
Actions like the one described in this article could bring down a company, if a person tried hard enough. Of course, Microsoft could just call Verisign and ask them to remove the Cyrillic domain, with no problems. But, for a small company, it could be hell. An entire user group using the same character set to access a certain website would be sent to a different site. In a worst case scenario, anti-company propaganda might be posted on the spoofing site, and it would deter people from visiting the "real" site in the future.
The only solution I can imagine is to simply prevent the translation of characters among character sets, especially in this sort of environment.
A Russian site, such as The Moscow Times [themoscowtimes.com], could have its site spoofed in exactly the same manner, and everyone using the Cyrillic character set (obviously, widely used in Russia, for example) would be sent to some other site, possibly indefinitely, knowing how registrars have been acting lately. This would create havoc for the newspaper and significant hurt revenue.
Re:Unicode Environments (Score:2)
...why?
Re:Unicode Environments (Score:2)
...why?
It started out with a weirdness in Windows 2000 we had to work around, and it involved using the Win32 API TCHAR data type, so that it could compile on both Unicode-enabled systems and ANSI character systems.
To make a long story short, we were forced to enable Unicode in one of our products; then, we thought it a good idea to have all our products capable of internationalised data.
Yeah. That.
Re:Because they're smart (Score:3, Interesting)
It's also disappointing that unicode forum dropped their official JISUTF tables. There is no longer any official translation table for japanese encodings to unicode. It's the wild west for asian languages in unicode (ever wonder why no asian data systems use unicode?)
Re:Unicode Environments (Score:2)
It was tested; this is considered acceptable, as there are no workarounds.
There will be look alikes in Unicode, just like there are in ASCII. Prior character sets, including KOI8-R, ISO-8859-5, ISO-8859-7, and JIS X0213 - pretty much every character set with either Cyrillic or Greek in it - have the Cyrillic or the Greek A seperate from the Latin A. Besides backward compatibility, proper multilingualization calls for them to be kept seperate; what's the lowercase A look like, if
the Greek and Latin A are merged?
DNS was, and is, an ugly kludge (Score:4, Interesting)
At the moment these unicode domain names will not be displayed correctly by web-browsers, rather you will see a bunch of cunfusing control codes, so this threat isn't really a problem yet.
Of course, the underlying problem is that DNS is an ugly kludge which has long-outgrown itself. The administrative cost of constructing a massive global namespace is vast, and we can all see the opportunities for cyber-squatting it creates, to the detriment of the public interest.
These days I am more likely to go to Google and type in a few words, rather than try to guess the URL. The task of finding the website you are interested in should be left to the specialists (like Google and other search engines), we shouldn't try to maintain an ugly, broken, monopolistic, and expensive "first come first serve" architecture like DNS.
There is no good reason why a web user should ever need to see a URL (except perhaps momentum), any more than they need to see the HTML which makes up a document.
Re:DNS was, and is, an ugly kludge (Score:2)
Obviously with IPv7 we'll just have to ask lain to send us to the right site.
Re:DNS was, and is, an ugly kludge (Score:2)
Re:DNS was, and is, an ugly kludge (Score:2)
Now, I agree that use (1) is dead. However, I don't want to have to remember 64.28.67.150 to read slashdot, nor do I want to be dependent on google to find slashdot. Think of the pontiac example, where I'm looking for a specific page: google rankings change, but domain names change less often. If google decides they don't like the American Communist Party, I may have a hard time finding their website without DNS, whereas google does not control the cpusa.org domain name.
There are also other, less obvious, uses for DNS. For example, I can type in ftp11.freebsd.org and see if that's faster than ftp6.freebsd.org, without having to search for the FreeBSD mirrors page. You can also publish spammer's IP addresses to DNS tables, like what RBL does. That means when I write my MTA, I don't need a full HTTP engine in it along with an XML/SGML/HTML/WHATEVERML parser, but I can just do a simple "gethostbyname()" and see if that returns an error. There are lots of other creative abuses for DNS.
Anyway, I think there's still a real need for DNS; however, DNS administration leads to so many politics...this article mentions a technical problem, but the real problems are social/political. These problems are much harder to solve.
Re:DNS was, and is, an ugly kludge (Score:2)
Re:DNS was, and is, an ugly kludge (Score:3, Insightful)
If there is a demand for something that we already have at this time, for free and with no effort? In other words, you would like it if I paid for something I already get now for free... well, if you can't find a good business model, why not create an artificial one?
What about the cyber-sqatting, cost, and creation of private monopolies? DNS is an ugly ugly solution to the problem of finding IP addresses.
Cyber-squatting is simple. Outlaw domain parking, domain transfers, false advertising (which is what registering www.books.com and pointing it at a porn site is), and enforce trademarks. If you want a domain, then use it. Use it for something other than pointing yet another name at your lame web site. Only allow registrations and de-registrations... if someone wants to try and sell the domain and someone else wants to pay money for it fine. But they don't get it, it just goes back into the unregistered pool. And if someone has a valid trademark (microsoft is valid, computers.com isn't) by all means give it back to the trademark holder. Duh. DNS is pretty handy for finding IP's, actually. It just isn't as good at making websurfing as effortless as you'd prefer. Or for keeping people from being assholes and polluting the namespace, I should add.
Market forces will create a demand for comprehensive search-engines which aren't biased, in fact, they already have.
Dumbass. On a fresh install of the browser of your choice (or lack thereof), you can't get everywhere you want to go only by clicking links. If the url field is hidden or disabled, which you advocate, you'll be reduced to clicking a toolbar button or a pre-loaded bookmark. I'm sure one such will be a searh engine... but with M$ can you count on its integrity?
What the hell are you ranting about? This has nothing todo with whether your ISP supports cgi.
So sorry, I thought you might have the ability to understand non-monosyllabic words. Let me try again...
I-S-P bad. No like us have nice web names. Must use bad homepage **DAMN*
I'm tired, so I'll try to make this clearer. If users are only ever allowed to use crappy homepage webspace, of course half the URL's on the net will be long and ugly. I also failed to mention that many commercial sites have bad web design... this accounts for the other half being ugly.
And if I got off on a rant, so what? I see someone like you talk out of your ass, I become a little bit upset. Well, guess what? If you want to add another protocol, pick a port number and get to work. I won't stop you. But stop ranting yourself about how the current ones are ugly, when you have no clue why they are even like they are.
DNS isn't broken, and it isn't ugly. As a protocol, it is highly distributable, robust, and solves the IP-human readable name problem as well as anything that has ever been published. It is the foundation of many protocols and services available on the internet, only one of which is the web. We don't need a seperate, incompatible system for the web, and you've offered nothing that would suffice for anything but that, and even then only poorly.
Re:DNS was, and is, an ugly kludge (Score:2)
Your opinion itself speaks volumes about what you know and understand. You're the clueless suit touring the factory, wondering why the steam pipes aren't chromed.
No one has ever charged me for typing in "ibm.com". My use of DNS is either free, or depending on how you look at it, the cost is rolled into my ISP subscription. They're not going to give me a refund if somehow stooges like you gut DNS.
DNS doesn't solve your "URL's are ugly" problem, because 1) it isn't DNS's problem to solve, 2) it is largely the result of either bad web design or bad ISP policy and 3) only idiots are complaining about this.
The expense of the domain name squabbling solutions I outlined, is either already being paid for (we have a system to settle trademark disputes), or could be done realtively inexpensively with what amount to a few shell scripts (do the parked domains all bounce to a single site). All that would have to happen is for ICANN to pull its collective ass, and make some sensible policy. Simple as that.
Oh, and one last thing. How do you expect google to index websites, if there is no DNS? Are we all supposed to go back to using IP's? Do we embed those in the hyperlinks instead of domain names?
Re:DNS was, and is, an ugly kludge (Score:2)
And at the grand old age of 25 (according to your webpage), too.
Re:DNS was, and is, an ugly kludge (Score:2)
Re:DNS was, and is, an ugly kludge (Score:2)
Re:DNS was, and is, an ugly kludge (Score:2)
Re:DNS was, and is, an ugly kludge (Score:2)
Re:DNS was, and is, an ugly kludge (Score:2)
For example, Google is a great replacement for DNS, not functionally equivalent, but basically does the same thing but much more effectively.
No, not really. You seem to support replacing one 'middleman' with another [DNS -> Google]. Google is good and all as a search engine, but I don't really understand why you think indexing existing pages and ranking content based on some scheme (a la Google) somehow improves upon rational DNS entries or eliminates the risk of underhanded manipulation of whatever system is in use.
Yes, there are flaws with most DNS naming schemes. No, it's not perfect. I'm skeptical of your claims that a Google-ish system somehow fixes things for everyone. It certainly would eliminate the ability to go directly to a known resource without there first being a 'search' ... and a resource that was
'known' to be within the first n results might not
be a day, week, or month later. Maybe I'm missing
something, but where's the progess there?
Meanwhile, hosts would be identified how? Simply by numeric address? I think earlier in-thread comments need to be emphasized: DNS is for naming hosts, not Web sites.
Much as I favor Google for blind Web searches, it really hasn't attained a level of perfection as a search engine alone and nothing else. I'm at a loss to grasp what insight you possess into the workings of Google and the nature of the Web as a whole to believe -- as you must -- that there is complete certainty/symmetry in what is 'out there' on the Web and what Google presents to users.
Google certainly doesn't function in a way analogous to DNS, and I don't think that you should claim that something "not fuctionally equivalent" could be seen fairly as a "great replacement."
Think about it.
Are international domain names even necessary? (Score:4, Insightful)
But are international domain names even necessary? Kuhn, who is German, doesn't think so: "Familiarity with the ASCII repertoire and basic proficiency in entering these ASCII characters on any keyboard are the very first steps in computer literacy worldwide."
That's like saying basic numeracy is the first step for computer literacy worldwide, so we should go back to using IP addresses!
Currently email addresses and URLs are the only reason a native Chinese speaker needs to use ASCII. For someone from Germany, ASCII is pretty easy to handle, but for a lot of languages, Unicode URLs & email addresses are very necessary
Re:Are international domain names even necessary? (Score:2, Interesting)
i think we should stick to ASCII
Re:Are international domain names even necessary? (Score:4, Insightful)
IDNC3 (Score:5, Informative)
Who needs a paper... this is irrelevant (Score:4, Informative)
2) There are more TLDs out now, and the same name at a
3) There's always the old numeral "1" swapped for the lowercase "L" or the uppercase "I", trick, among other similar things that never involved Unicode, but rather human vision and high-resolutions.
4) The "@" symbol in the URL trick, like http:\\microsoft.com\moneyfrombil@haxor.com?actio
So if you haven't figured out my point yet, a good percentage of people that use the internet are going to be fooled by far simpler feats of social engineering. Who needs Unicode to do it?
Isn't fraud illegal? (Score:2)
If you buy something online without using a credit card, you deserve to get scammed.
If you buy something with a credit card, not only will you get your money back (actually never lose it in the first place), but the scammers will likely go to jail.
Besides, why are you clicking on links in your spam anyway?
Reminds me of my friend (Score:2)
Think of the fun you could have with this! (Score:3, Funny)
From there on, it only gets better and better. Think of the countries you would be able to influance, technology developement you could steer, and leaked memo's you could fabricate..
Damn i wish i had thought of it
Think of the enormous lawsuit! (Score:2)
If you pretend to be someone else, or if someone registered an alternate lookalike domain for microsoft.com and used it to in any way whatsoever to benefit from the fact.. they'd be in deep sheep.
Different behaviour on different TLDs (Score:3, Interesting)
So for example '.uk'/'.au'/'.us' etc. can ONLY have ASCII 2nd level domains. '.de' Can only have German characters, '.fr' only French, and so on
Then for completely different character sets, you have new Unicode TLDs (Arabic, Greek, Chinese), which can only have their relevant characters.
I guess you leave
Of course, this adds complexity - but you can do all the testing for validity when the domain is registered (i.e. a web client can request any URL, but dodgy mixed character set domain names cannot be registered).
Nothing new (Score:2)
You just must not take anything for granted which you see or read on the web.
You gotta love this quote... (Score:2)
Yeah, that's why a couple of Israeli college students were unable to register mirsoft.com (spelled "miсrоsoft")...oh wait a minute, what were they saying again?
Verisign -- the company you can trust! (Score:3, Interesting)
... so it seems safe to say that trust is the foundation of their business. Essentially, we trust Verisign to ensure that we're communicating with whom we think we're communicating, and to protect us from various forms of spoofing. They should therefore, IMHO, actively avoid even the appearance of impropriety.
However, we all remember [slashdot.org] the Microsoft certificates they mistakenly gave out to a third party.
Now we've got them registering another domain to someone that looks just like "microsoft.com." While it's tempting to absolve Verisign of guilt in this, I think they were asking for it. After all, even I thought of this possibility when I first heard about Unicode domain names, and I'm not the sharpest knife in the drawer. You've got to think someone at Verisign raised the possibility, but they chose not to deal with it.
Again, one might be tempted to say that this isn't their problem, if not for the fact that they are in the trust business. As the article says, "Certification agencies (which include VeriSign) ensure that encoded names are not misleading and that the registration corresponds with the correct real-world entity." It should not be technically difficult, for instance, to build a set of lists of visually similar Unicode characters and to refuse to register domains visually identical to existing ones. Maybe they should decide to forgo a relatively small amount of revenue and to refuse to sully their reputation with such inevitably deceptive domain registrations, especially considering that they interfere with Verisign's core business.
Of course, none of this compares to the letters they sent out [slashdot.org] trying to fool people into switching their domains over to Verisign. The other two were negligence and foolishness, but that was an active attempt to deceive from a company that's selling trust.
It all leaves me in a bit of shock. It's not that I'm shocked to see a company doing stupid and deceitful things; it's that trust is Verisign's primary asset. Hearing about these (colossally, in my mind) stupid decisions is like hearing that GM decided to torch all its manufacturing plants and assasinate all its employees. It leaves me with two questions: "what they hell are they thinking?" and "why does anyone continue to do business with Verisign?"
You are mixing things up. (Score:2)
They are not required to, nor do they claim to, verify domain registrants UNLESS those registrants apply for digital certificates.
Yes, verisign are scum.. but you are barking up the wrong tree here. They are not at all requred or expected to verify domain registrars.
Hey. I wish they were. Imagine how many domains would have to be revoked? Literally millions.
Shush, the filter here doesn't know about cekc!!!! (Score:2)
Unfortunately, it doesn't protect against 'cekc' (I can't be bothered to get type this in Cyrillic here).
Also discussed in "Secure Programming" HOWTO (Score:2)
Paper Online (Score:5, Informative)
That is, if you are interested in the dry, technical details... ;-)
sure they get the domain name, what about hosting? (Score:2)
Why not stick with English? (Score:2)
The internet has shrunk the barrier to exchange information, which has made diverse languages even more significant of a barrier. If we use UNICODE and just let accept that everyone wants to use their own language, then the internet will end up as a group of national islands of information. Each group will surf their set of native language web sites. When you search the web, the information on that Nokia phone might not be readable by you (Babblefish isn't a solution).
Language has always been a barrier, and I hope the internet will be the tool by which that barrier is torn down; not the tool which escalates the problem.
Re:Why not stick with English? (Score:5, Insightful)
The 5 billion people in the world who don't have English as their native language might. Some would argue that language is a cornerstone of culture, and that when a society loses their language, they lose a significant part of their culture. I've read parts of Shakespeare in German, and was very unhappy about the destruction of the writing. I know several poets of my native tongue (Poe, in particular) would be lost completely in translation. I have no interest in condeming other people to reading the great literature of their cultures in translation.
In any case, ASCII isn't good enough for English writing. French accents are used in English writing, as well as the ae and oe ligatures. Even in modern writing, proper quotes and apostraphes are needed, and footnote daggers often show up in English writing. For specialized work, mathematics, linguistics (even of English), historical English writing and APL all have thier own body of characters outside ASCII that need supported.
Re:Why not stick with English? (Score:3, Informative)
Yes. It's ridiculous to ask people to learn (admitedly a small part of) a new language to use a computer. Just because English is taught in a lot (not all) of schools around the world, it doesn't mean that everyone is comfortable using it. A truely usable computer should be one which allows you to interact with it 100% in your own langauge.
The internet has shrunk the barrier to exchange information, which has made diverse languages even more significant of a barrier.
The main barrier to computer usage in a large part of the world is that it is still an elitist medium - only useable (and affordable) by the well-educated. If you are actually interested in making it easier for everyone to communicate, then the main technical issue to be solved is how to make the internet useable by anyone from any background.
If we use UNICODE and just let accept that everyone wants to use their own language, then the internet will end up as a group of national islands of information. Each group will surf their set of native language web sites.
This already happens. Of course people surf websites in their own language! Because you (and I) only surf the English-speaking fraction of the web, you don't see it. All that international domain names adds is that a Russian accessing a Russian website can do so via a Russian URL. What could be more sensible or obvious than that?
If no standard is agreed upon, proprietory standards will pop up all over the place, and it'll be a huge mess. In fact this is already happening - although he's the current anti-Christ of Slashdot [slashdot.org], the big selling point of RealNames was for non-English languages, and if you believe Keith Teare's [teare.com] account, he was shafted by Microsoft because they wanted to control (via their browser) the translation of non-ASCII names to ASCII URLs.
Re:I fail to see (Score:2)
But with this unicode spoof, then you could go to a site you think is legitimate, and you'd have no way of knowing it's not.
Re:I fail to see (Score:2, Insightful)
Re:I fail to see (Score:3, Interesting)
For the use-cases like this I think that multilingual URLs are a Bad Idea (TM).
Re:I fail to see (Score:2)
Since the surrounding characters are Latin, I think it safe to assume they are 'c', 'a' and 'o'. (BTW: encodings are things like ISO-8859-*, KOI8-R, and so on, which the IDN will only use Unicode. The question should be what script they are in.)
Same goes for URLs, etc.
You've never been prohibitied from using non-ASCII stuff in URLs.
Another option -- say a Swedish company registers an URL that perfectly represent the name of the comapny in Swedish. With all those umlauts and whatever-they-are-called-those-circles-over-A. And you are sitting there with a US_en keyboard -- how are you expected to type that URL into a location field in your browser?
Depending on your system, you can use ALT- or SHIFT-CTRL- combinations and the character numbers. Character Map or the equivelent will also let you enter the characters in.
OTOH, why is this a problem? If they have a large non-Swedish audience, they ought to register an all-ASCII name. If they chose not to, then that's their problem. Odds are any such site will be in Swedish for Swedes.
Re:I fail to see (Score:2)
Re:I fail to see (Score:2)
You can't have it in the domain name. You can have it in the part of the URL following the domain name.
å is actually another letter entirely, something lots of english speakers can't get the grasp of.
I would be surprised to find many English speakers who couldn't learn that. I would expect that most of them just don't know that fact right now, and that many of them really don't care.
Re:I fail to see (Score:2)
Re:Terminology whine (Score:3, Insightful)
Re:cyrillic trivia Re:Terminology whine (Score:3, Offtopic)
Ohter english letters to fade is yoch [looks like a 3] - this is the z in Menzies = Men3ies "Menges".
Also of note is digamma. In the greek number system, this is 6, that is, the 6th letter of the alphabet. As a letter, it appear between epsilon and zeta. Since our alphabet is derived from the greek, one notes the letter here not only looks like digamma, but preserves much of the original sound: F. Phi was an asperated p.
Cyrillic bears a much closer resemblance to the classical greek letters, and the theta, indeeds represents an f here.
Unicode reflects current realities. There is more than one Cyrillic Alphabet, just as there is more than one Latin alphabet.
Re:Terminology whine (Score:4, Insightful)
That is false. Russian people had alphabet long before Cyrillic. Incidentally, that should really be proto-Russian, or Eastern Slavic since the people diverged into Russian, Ukrainian, and Belorussian much later.
So it could be said that "Russian Cyrillic" is redundant.
It is not. There are several "dialects" of the Cyrillic alphabet. They are mostly the same but a few letters are different. I already mentioned three of them above. There's also Bulgarian, Serbian, and I'm not sure what else.
I seriously doubt the the "c" and "o" characters mentioned in the article are unique to the K018R charset
The charset is called KOI8-R. Or are you using the l33t sp3lling?
Re:Terminology whine (Score:2)
Fair 'nough. The good bishop simply wanted a written language that he understood, so that he could teach his religion. So the creation of the Cyrillic alphabet is a matter of convenience for the religious powers-that-be of the time. Not a new story, unfortunately. And your point about proto-Russian is well-taken.
It is not. There are several "dialects" of the Cyrillic alphabet. They are mostly the same but a few letters are different. I already mentioned three of them above. There's also Bulgarian, Serbian, and I'm not sure what else.
While in the broadest sense, you are right (I have a great story outside the context of this article on a miscommunication on my part with a Ukranian individual who I mistakenly thought was speaking Russian) in the context of my point about those two specific characters, I disagree. Again, a Unicode geek could prove me wrong.
The charset is called KOI8-R. Or are you using the l33t sp3lling?
Lol, heh. You are right on there. I was just dashing off a reply to the article, and wasn't paying enough attention to the niceties. l33t sp311ing was farthest from my mind, b3 a55ur3d.
Re:Terminology whine (Score:2, Offtopic)
Today there are several variants of Cyrillic - Bulgarian, Serbian, Macedonian, Russian, Ukrainian, and it was used even in some of the former soviet republics and Mongolia, whose languages are very far from Slavic.
Also, KOI8 is not considered the Cyrillic codeset by other cyrillic-using nations, it is rather considered the Russian cyrillic code set. Other codesets are the Windows 1251, and ISO-8859-5. The latter would arguably be the standard Cyrillic code set.
Re:Terminology whine (Score:2)
I do thank you for the correction on the charsets. I kind of knew that would happen
Re:Terminology whine (Score:2)
I am sorry, but you are wrong - please see my other post for some links. Here is another: http://education.yahoo.com/search/be?lb=t&p=url%3
IMO, the major contribution of St. Cyrill and Methodius is not the creation of an alphabet, but their disputes with the Western church and the Pope regarding the right for the different peoples to learn and practice Christianity in their own language. Up to that point only Latin, Greek and Hebrew was used in church services...
[OT] not quite correct (Score:3, Interesting)
This was only true in Western Christendom and then only true to a limited extent. For example, in the west, the first Christian missionaries to the British Isles translated the service books of the early Church to Gaelic and other Celtic languages. In the east, the the generally accepted practice was to use the venacular. This is why some of the oldest extent copies of the Bible are in one of the Ethiopic languages, Coptic, Syrian, etc.
The Roman canon that the liturgy could only be practiced in one of the tongues spoken by the apostles was of relatively late invention and only applied to congregations under the sole apostolic see of the west, Rome. Congregations under the apostolic sees of the east always used the venacular.
Hence it is somewhat ironic that many eastern Churches refuse to update the liturgy from being in liturgical Greek or old Slavonic into their modern equivalents.
Regards,
-l
Re:Terminology whine (Score:2, Informative)
These two also invented cyrillic. The difference is that glagolitic didn't survive very long, while the cyrillic is still in use today. The last country to use glagolitic in any quantity is Croatia, up to the end of the 19th century.
Re:Terminology whine (Score:2)
Re:Terminology whine (Score:2)
Unicode Sorting Algorithm. [unicode.org]
Would be really slow to use some funky custom sorting routine.
What are you running? There are massive databases that use binary compare, and bitty boxes that use binary compare, but even my 386 should be able to do decent sorting in a negligable amount of time.
I don't know of many character sets that put the characters in sort order. ASCII doesn't work for English, because capital letters and lower case letters don't sort together. Latin-1 puts all its characters after ASCII, when some of them should sort with the ASCII characters.
As for why, the fact is it's not an option in a multilingual enviroment. Lithuanian sorts y after j; Swedish, German and Danish use some of the same accented characters, but sort them differently. The whole concept of binary sorting fails for some languages; Maltese and traditional Spanish both sort two letters ("ch" and "ll" for Spanish) as if they were one, and German sorts one letter ("ß") as if it were two ("ss").
Re:Terminology whine (Score:3, Informative)
Because they were ordered as a transliteration for the Latin alphabet (sorry, can't put it in Cyrillic): ABCDEF instead of ABVGDE.
My guess is that this was done to easily transform Russian text written using the Latin alphabet into Cyrillic by simply flipping a bit.....
Easy (Score:2)
If you're serious about typing in Russian, you don't type the control-meta-alt-whacky sequences.
You spend $15 and buy a plastic keyboard overlay, one of those little flexible jobs with the alternate characters printed on them. Change your keymapping -- they make keymap files to match the popular overlay's plastic sheets, I'm told -- and you're done.
Re:Still... (Score:3, Interesting)
Under unix it is usually a bit more p*** in the a*** because most internationalisations rely on Xmodmap and it no longer works nowdays. Once again by default you will get stuck with something you cannot use unless you have a keyboard that is engraved with the alternate characters. Once again you will need to spend half an hour with vi swearing at whoever made Xmodmap not to work any more in order to get a less obscene keymap.
Re:Still... (Score:2)
The Alt-keypad trick only works for 8-bit characters, AFAIK. You can copy characters out of Character Map (in Win2K/XP, not Win9x...Win9x's Character Map doesn't grok Unicode) and paste them into whatever you're typing, though: , , etc. (I think the first is "da" and the second is "nyet"...saw something that looked like that in a banner ad on a Russian website recently.)
If all else fails and you're editing HTML, you can escape the character entries, so that (for instance) gets entered as да.
Re:The site (Score:3, Funny)
Lousy cybersquatters...
Re:The site (Score:2, Funny)
Re:The site (Score:2)
Re:It shouldn't really be a problem. (Score:4, Informative)
A lot of small e-business sites want to use their hosting provider's cert, but don't want the user's browser to display the hosting company's domain rather than their own. (Yes I know it's stupid, people are picky as fuck when you are making web pages).
Anyway, that causes the browser to warn that the cert is not valid for the domain it is being used in.
It's kinda possible to get around this using frames, but then the browser might say something about mixed secure and unsecure items on a page. The only real way to do it right is to just let the users see the hosting provider's address, as far as I know, or have the site buy their own cert.
Re:Lunix saves the day! (Score:2)
Try Linux. It's had Unicode for years.
Re:Client-side fix? (Score:2)
Unicode defines character code points, but doesn't specify their appearance.
There's nothing preventing an application from using lame fonts for glyphs, and in fact many do.
On average, unicode implementations vary from bad to utterly horrible.
still encrypted, but... (Score:2)
This is the reason for trusted signatures on certs.
Hit google for "man in the middle attack" if you want to know more.
Re:Obligatory observation (Score:2)
-
The only way to get rid of a temptation is to yield to it. Resist it, and your soul grows sick with longing for the things it has forbidden to itself. - Oscar Wilde (1854 - 1900)
Re:Obligatory observation (Score:2)