Stories
Slash Boxes
Comments

News for nerds, stuff that matters

International URLs Pass First Test

Posted by Zonk on Tue Mar 13, 2007 10:20 AM
from the nice-to-see-some-new-faces-online dept.
Off the Rails writes "The BBC reports on the results of a successful test of non-ASCII domain names on Internet-equivalent hardware (pdf) carried out last October. The next stage is to plug the system into the net, and if it still works, it could go live sometime next year. 'Early work on the technical feasibility of using non-English character sets suggested that the address system would cope with the introduction of international characters tests were called for to ensure this was the case ... Also needed are policy decisions by Icann on how the internationalised domain names fit in and work with the existing rules governing the running of the address books. Icann is under pressure to get the international domain names working because some nations, in particular China, are working on their own technology to support their own character sets.'"
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

International URLs Pass First Test 50 Comments More | Login /

 Full
 Abbreviated
 Hidden
More | Login
Keybindings Beta
Q W E
A S D
Loading ... Please wait.
  • Great (Score:5, Funny)

    by otacon (445694) on Tuesday March 13 2007, @10:22AM (#18332799) Homepage
    now I have to learn second languages to look at asian porn.
  • by L. VeGas (580015) on Tuesday March 13 2007, @10:25AM (#18332827) Homepage Journal
    Imaging all the new ways to spell bank0famerlca.com.
  • Dibs! (Score:4, Funny)

    by truthsearch (249536) on Tuesday March 13 2007, @10:26AM (#18332865) Homepage Journal
    I got dibs on sêx.com!
    • Re: (Score:3, Informative)

      I got dibs on sêx.com!


      Umm, you do realise this was registered in 2005? Such domains already exist and can be registered today.

      The technical test is about having Internationalised Domain Names at the top-level, or root, of the DNS. So then you can have
      • Re:Dibs! (Score:5, Funny)

        by VWJedi (972839) on Tuesday March 13 2007, @12:27PM (#18335137)

        The technical test is about having Internationalised Domain Names at the top-level, or root, of the DNS. So then you can have .sêx rather than .sex.

        So we could theoretically have sex at any level... but this is slashdot, so it's not likely to happen for anyone around here.

        [ Parent ]
  • Phishing (Score:2, Redundant)

    In my skim through the various links, I didn't see what they are proposing to do for practical real-world problems such as phishing. What are they going to do to ensure that a phisher doesn't register a domain with characters that look almost indistinguish
    • Re: (Score:2)

      They'll do the same as is done right now: very little. If you're a company in this day-and-age, you have to register as many variants of your name as you can to ensure that phishers/domain squatters don't get undue traffic from your name. On the other hand

    • Call them, say, "character sets.

      Then only allow names and queries all from the same character set.

       
    • Re: (Score:3, Informative)

      This has actually been discussed to some extent for years. One method is to only allow domains to be registered or displayed in a single language character set, such that a domain name can use latin characters or greek characters, but not both. This can be

  • If your company/organisation/you have any international contacts then you will NOT be using these international URLs. So you still need the old-style URLs or you'll need to explain how to get those umlauts etc to type in the url. On their national keyboard
    • Re: (Score:3, Informative)

      umlaut is hardly a problem if you set the use keyboard to üs-ïnternätional. But asian/hebrew/arabic/hebrew charcacter are much more difficult to enter... in my expierence.

      But you will still be able to click them. IDN support is available in
      • Re: (Score:2)

        IDN support is available in most popular browser (although disbled for security issues.)

        What browser are you referring to? IDN support is in Firefox, IE, Opera etc. and not disabled, so I am wondering what this most popular browser you are referring to is.
      • Re: (Score:2)

        umlaut is hardly a problem if you set the use keyboard to üs-ïnternätional. But asian/hebrew/arabic/hebrew charcacter are much more difficult to enter... in my expierence.
        Those who will have these "international" URLs will almost all be usi
    • Re: (Score:2)

      So you still need the old-style URLs or you'll need to explain how to get those umlauts etc to type in the url.

      How often do you ever type in an URL in the first place? You get the link from another web site, from Google, in an email or wherever. And AFAIK,
  • Pardon my ignorance, but couldn't they have just thought of an encoding scheme? Similar to how certain characters are encoded in the path of an URL ("&"-style or "%20"-style). Possibly a more complicated scheme would have been necessary, but surely it
    • Re: (Score:2)

      Pardon my ignorance, but couldn't they have just thought of an encoding scheme?

      Already been done. See Punycode (RFC3492). The problem with encoding schemes, though, is that they aren't memorable, and hence are problematic to typo into, say, the location

  • English "X" vs. Cyrillic "khah" (Score:4, Insightful)

    by J.R. Random (801334) on Tuesday March 13 2007, @10:42AM (#18333243)
    This is just common sense -- there's no reason why Chinese, Greeks, and Russians should have to use a character set meant for the English language. But any given URL should have a language associated with it and any character in that URL not associated with its language should be color coded. So English language URLs would get "omicron" flagged while Greek URLs would get "O" flagged. The "default" language could be English so that existing URLs are unchanged, for other languages their ISO code could precede the URL. Now this particular scheme might have some fatal flaw but something similar ought to be workable.
    • Couldn't these linguistically-heterogenous domain spaces still be universally linked through romanization? I see one possible solution: An intermediary DNS conversion server; i.e. type "[those were supposed to be Japanese kanji].co.jp" and your DNS reques
      • Re: (Score:3, Interesting)

        For some languages, like Arabic, there is no one standard for romanization. A trivial example is Qu'ran/Koran.
    • Re: (Score:3, Insightful)

      Agreed, although I think a dialog box should also be shown as an annoyance / deterant. Otherwise just imagine what the Web 2.0 folks will do when they realize they can redirect their site to one with cool multi-colored URLs, thus conditioning people to ign
  • Also needed is automatic translation by, say, a Firefox extension, from the domain name's registered home language (if any) into the user's default language. How do you say "goatse" in Urdu?

    A good complement to the new system to preempt the huge coming pro
  • Security minded questions (Score:3, Interesting)

    by merc (115854) <slashdot@upt.org> on Tuesday March 13 2007, @10:50AM (#18333419) Homepage
    Will having non-ASCII data in FQDN's open us up to buffer-overflow attacks in various network-aware services?
  • by Ron Bennett (14590) on Tuesday March 13 2007, @10:51AM (#18333453) Homepage
    Below is a quick copy and paste from one of my posts on DNForum regarding IDNs ... I own some IDNs and believe they have much potential, but there are still many unanswered questions...

    Excerpt from a post of mine on DNForum regarding IDNs:
    http://www.dnforum.com/showthread.php?p=732080 [dnforum.com]

    I'm running into a lot of issues that many IDN folks aren't discussing - probably because they've not consider them ...

    Various issues / threats / questions:

    ?? The existance of numerous diverse dialects, even totally different languages, etc in the same country ... it's among the reasons that English dominates in some areas; some natives, even if they can understand a particular dialect, will sometimes speak a totally non-native language, such as English, instead to avoid risk of offending the other party. One can't assume one language dominates an entire region - languages can also overlap many areas ... it's one of the reasons some are pushing for language / culture based TLDs, such as .CAT (among the dumbest ideas ever, but that's another discussion for the .CAT thread running here on DNF).

    ?? An IDN that contains western european characters that very close matches a non IDN ... ie. cafe.com verses café.com ... what happens? Will the IDN be highlighted / blocked by default? ... likely an easy UDRP target? ... introduction of a new IDN specific dispute procedure? -perhaps there already is one?

    ?? Trademark issues ... ie. an IDN that is similar / exact to a trademark in another country ... less obvious, what about an IDN that translates to that of a trademarked word / phrase? -I believe there's a thread discussing such an issue now on one of the other boards here.

    ?? language variants (more applicable to asian languages, etc) related issues ... how good / stable are the various language variant tables?

    ?? what happens when a language variant table changes? -how are conflicts handled?

    ?? what happens if a character variant (an IDN [IDL package] technically can comprise multiple character variants [code points]) is released? ... does the current registrant get first dibs? ... even if yes, it may not be quite that simple if a character variant occurs in numerous permutations.

    ?? What happens if a reserved character variant is changed to a preferred character variant? - while such a change would have little to no effect on affected IDNs (IDL packages), it could result in the appearance of some IDNs changing ... probably not a biggie compared to some other issues, but one to be aware of.

    ?? How reliable, especially for those in languages with numerous character variants, will IDN domain resolution be? ... IDN resolution depends on much client-side APIs.

    ?? How well will IDN resolution APIs be regulated ... I can easily envision scenerios in which a web browser and/or other applications (email, IM, etc) implement resolution differently ... ie. adding and/or ignoring one or more valid language associations for a particular IDN / converting similar-looking western european characters to standard A-Z characters, etc. A related concern is language table management - I'm a little hazy on if the tables will be internally stored by each app or remotely loaded for each session, etc.

    Rambling on, but there are a lot of things that one needs to be aware of with IDNs.
  • Balkanising the internet? (Score:4, Interesting)

    by hcdejong (561314) <h.c.de.jong@xmsn e t . nl> on Tuesday March 13 2007, @10:53AM (#18333515)
    Would this lead to segregation of the internet into zones defined by the language used for the domain name? At the moment, I can access e.g. Japanese websites easily, even if the content of that site is in a language I don't understand [1].
    If non-Roman domain names become popular, will I still be able to access them, or will they disappear behind untypeable URLs? A search engine may be able to mitigate this problem somewhat, but ATM I sometimes get search results for Japanese-language pages only because my search term is present in the URL.

    1: yes, a site can still be useful in this case and no, despite the stereotype it's not just for porn.
    • Re: (Score:2)

      You're looking at this from the perspective as a native English speaker.

      Imaging all the Japanese who don't know English, but have to learn/type english domain names. Very unintuitive for them.

      My concern would be for all the internet filtering and firewall
      • Re: (Score:3, Informative)

        My concern would be for all the internet filtering and firewalling software which explicitly only allows ASCII in HTTP headers.


        IDN encoding is pure ASCII, in a similar way that MIME email attachments are. The protocol layer never sees anything other than l
      • Re: (Score:3, Interesting)

        Imaging all the Japanese who don't know English, but have to learn/type english domain names. Very unintuitive for them.

        Bad example.

        The Japanese are probably the *least* likely of any non-English speaking country to use non-roman url's. The fact is the st
  • As far as I know, Japanese URLs have been working and in use for quite some time. I've visited several myself. Mind you, I'm surprised anyone in the anglophone sphere takes notice.

  • And there are quite some solutions to it. One of them (I think this is the one we're talking about) is converting the characters to ASCII and serialize them. Quite simple, let the browser do it.
  • Already done (Score:3, Interesting)

    by kahei (466208) on Tuesday March 13 2007, @12:03PM (#18334749) Homepage

    Once again, committees lag behind actual problems and actual solutions.

    Now if you'll excuse me I'll go back to browsing .jp.

    (I seem to recall that /. has issues of its own, so the ascii encoding of that would be http://xn--cckev5k8eta5k.jp/ [xn--cckev5k8eta5k.jp]. Anyway, the point is that characters beyond ASCII have been used for ages. Mostly by people who don't mind it when users from other countries can't access their site.)
    • Re: (Score:2, Interesting)

      by Anonymous Coward
      I would bet the average German Internet user knows how to do that. It's pretty easy when the key is on your keyboard: http://carbon.cudenver.edu/~tphillip/GermanKeyboar dLayout.html [cudenver.edu]
      • Re: (Score:2)

        Your average Mac user does too, since it's just option-u, followed by the letter. It was similarly easy on the Psion Series 3, but it seems harder on some other operating systems.
    • And this is mostly for countries that don't use the same characters as English (Latin alphabet?), like Japan and China.
    • Re: (Score:2)

      That should be "m&#2170;crosoft.com". Slashdot will probably need to be upgraded to support IDNs, it seems. :)
    • Re: (Score:3, Insightful)

      Like you already have with "l", "I" and "1"; or "O" and "0"; or "V" and "U", depending on the particular font you happen to use?

      Phishing attacks mostly works not because people can't see a minute difference between two lookalike letters; they work because
    • The concern I have with IDNs is that they will make it too easy to produce "lookalike" domains, like "mcrosoft.com".

      This really seems like a pretty minor issue to me. Browsers would just need to adopt a policy of flagging URIs with mixed language character sets, highlighting that character in red or something. More dangerous is the new domain land grab as companies gr

    • Re:Maybe not.. (Score:4, Insightful)

      by LighterShadeOfBlack (1011407) on Tuesday March 13 2007, @10:38AM (#18333167) Homepage

      While browsers can't even properly show non-english alphabet, this doesn't seem to be a good a idea. My native language contains many special characters and I usually end up deciphering the emails sent by mom to me, because along the way, servers replace these characters with funny things.
      Well is it the browsers or the servers that are the issue? AFAIK any modern browser fully supports Unicode and any other encodings so there shouldn't be an issue there. If the servers are the problem then either it's the protocol that needs updating/replacing (I don't know nearly enough about SMTP, IMAP4, or POP3 protocols to comment) or the servers themselves are non-compliant. If there's a problem it should definitely be fixed, but you really need to know what the problem is first.
      [ Parent ]
      • Re: (Score:2)

        Considering a lot of email is text, the inability to handle a character set may make it impossible for some people to email you if you have non-ascii characters in your address. Even people in your own country may have trouble. Not everyone uses the Outloo
        • Re: (Score:3, Informative)

          Just about any e-mail service should enable the use of non-ascii characters. Any halfway decent e-mail client will; if you're using Thunderbird or Mail or Pegasus, just set the character set to UTF-8; I believe Pine allows UTF-8 too. (Personally I can't im