Site Map    |    Site Index    | 
Quick Links:
Search:

Internet Corporation for Assigned Names and Numbers

^ Home

^ Current Topics

> Internationalized Domain Names

Guidelines for Implementation of IDNs

IANA Repository of TLD IDN Practices

ICANN Board Resolutions

ICANN Committees and Working Groups

ICANN Correspondence

ICANN Workshops and Presentations

IDN Glossary

IDN Mailing Lists and Public Forums

IDN Meetings Calendar

IDN Spoofing Concerns

RFCs

Unicode Code Charts

 

Internationalized Domain Names - Glossary

In an attempt to ensure that discussions regarding IDNs take place in a consistent manner ICANN has published an IDN Glossary. The glossary terms can be used freely and is expected to be expanded over time. If you have suggestions for additions and/or changes to the glossary please submit these to idn-glossary@icann.org. Comments will be posted publicly in the discussion forum at http://forum.icann.org/lists/idn-glossary/.

Historically, domain names on the Internet were restricted to using a limited set of ASCII characters (i.e. a-z, 0-9 and "-"). However, with the increasing use of the Internet in all regions and by diverse linguistic groups of the world, the demand for multilingual domain names has become more intense. Various acronyms are used widely in communications around internationalizing the domain name space. Explanations for many of these acronyms are provided below to help make this topic simpler to understand.

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z


A


ACE (ASCII Compatible Encoding)

ACE is a system for encoding Unicode so each character can be transmitted using only a limited set of ASCII characters (i.e. a-z, 0-9 and "-"). This is used because applications that use the DNS protocol may not reliably handle other values.

ASCII (American Standard Code for Information Interchange)

ASCII is a common numerical code for computers and other devices that work with text. Computers can only understand numbers, so an ASCII code is the numerical representation of a character such as 'a' or '@'. When mentioned in relation to domain names or strings, ASCII refers to the fact that before internationalization only the letters a-z, digits 0-9, and the hyphen "-", were allowed in domain names.

Back to top


C


Character

For the purposes of discussing IDNs, a ”character” can best be seen as the basic graphic unit of a writing system, which is a script plus a set of rules determining how it is used for representing a specific language. However, domain labels do not convey any intrinsic information about the language with which they are intended to be associated, although they do reveal the script on which they are based. This language dependency can unfortunately not be eliminated by restricting the definition to script because in several cases (see examples below) languages that share the same script differ in the way they regard its individual elements. The term character can therefore not be defined independently of the context in which it is used.

In phonetically based writing systems, a character is typically a letter or represents a syllable, and in ideographic systems (or alternatively, pictographic or logographic systems) a character may represent a concept or word.

The following examples are intended to illustrate that the definition of a character is at least two-fold, one being a linguistic base unit and the other is the associated code point.

U-label 酒 : Jiu; the Chinese word for 'alcoholic beverage'; Unicode code point is U+9152 (also referred to as: CJK UNIFIED IDEOGRAPH-9152); A-label is xn—jj4

U-label 北京 : the Chinese word for ‘Beijing’, Unicode codepoints are U+5300 U+4EAC; A-label is xn—1lq90i

U-label 東京 : Japanese word for ‘Tokyo’, the Unicode code points are U+6771 U+4EAC; A-label is xn—1lqs71d

U-label ایكوم; Farsi acronym for ICOM, Unicode code points are U+0627 U+06CC U+0643 U+0648 U+0645; A-label is xn—mgb0dgl27d.

Back to top


D


DNS (Domain Name System)

The DNS makes using the Internet easier by allowing a familiar string of letters (the "domain name") to be used instead of the arcane IP address. So instead of typing 207.151.159.3, you can type www.internic.net.

Back to top


I


IDNA (Internationalized Domain Names in Application)

IDNA is a protocol defined in RFC 3490 by the Internet Engineering Task Force (http://www.ietf.org) that makes it possible for applications to handle domain names with non-ASCII characters. IDNA converts domain name strings with non-ASCII characters to ASCII domain name labels that applications that use the DNS can accurately understand. Not all characters used in the world's languages will be available for use in domain names. Hence IDNA is not able to convert all such characters into ASCII labels.

IDN (Internationalized Domain Name)

IDNs are domain names represented by local language characters. Such domain names could contain characters with diacritical marks as required by many European languages, or characters from non-Latin scripts (for example, Arabic or Chinese).

IDNs made the domain name label as it is displayed and viewed by the end user different from that transmitted in the DNS. To avoid confusion the following terminology is used:

The A-label is what is transmitted in the DNS protocol and this is the ASCII-compatible (ACE) form of an IDNA string; for example "xn--11b5bs1di". The U-label is what should be displayed to the user and is the representation of the Internationalized Domain Name (IDN) in Unicode; for example " परीका " ("test" version in Hindi, Devanagari script ). Lastly, the LDH-label strictly refers to an all-ASCII label that obeys the "hostname" (LDH) conventions and that is not an IDN; for example "icann" in the domain name "icann.org".

(The above label definition are extracted from: http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issues-01.txt)

IDN SLDs or IDN 2LDs

Usually a reference for domain names with local characters at the second level, while the top level remains in ASCII-only characters. For example: [παράδειγμα .test] ("example.test" in Greek).

IDN TLDs

Usually the short reference for internationalized top-level domains, thus allowing the entire domain name to be represented by local characters. For example: [실례.테스트] ("example.test" in Hangul).

Back to top


L


Label

A label is an individual part of a domain name. Labels are usually shown separated by dots; for example, the domain name "example.com" is composed of two labels: "example", and "com".

Languages | Scripts | Alphabets

Languages are used by speech communities. Scripts are used to write down information in the various languages and this is done by using the corresponding alphabets or alternative writing systems.

LDH (Letter, Digit, Hyphen)

The hostname convention defined in RFC 952 (later modified by RFC 1123) was used by top-level domain Registries before internationalization. This meant that domain names could only practically contain the letters a-z, digits 0-9 and the hyphen "-". The term "LDH code points" refers to this subset. With the introduction of IDNs this rule is no longer relevant for all domain names although with the use of IDNA, what appears in the DNS remains LDH.

Back to top


P


Punycode

Punycode is the LDH-compatible encoding algorithm described in Internet standard [RFC3492], and in use today. This is the method that is used to encode IDNs into sequences of LDH ASCII characters in order for applications using the Domain Name System (DNS) to understand and manage the names. The intention is that domain name registrants and users will never see this encoded form of a domain name. The sole purpose is for the DNS to be able to resolve for example a URL containing local characters. For examples see A-label under "IDN".

The prefix in a Punycode A-label is always "xn--". Hence this prefix is recommended to be reserved by top-level domain Registries in order to avoid confusion when/if registrations of IDNs are introduced under the respective top level domain.

Back to top


U


The Unicode Consortium

A not-for-profit organization founded to develop, extend and promote use of the Unicode standard. For more information, please visit http://www.unicode.org.

Unicode

Unicode is a commonly used single encoding scheme that provides a unique number for each character across a wide variety of languages and scripts. The Unicode standard contains tables that list the "code points" (unique numbers) for each local character identified. These tables continue to expand as more and more characters are digitalized.

In Unicode, characters are assigned codes that uniquely define every character in many of the scripts in the world. These "code points" are unique numbers for a character or some character aspect such as an accent mark or ligature. Unicode supports more than a million code points, which are written with a "U" followed by a plus sign and the unique number in hexadecimal notation; for example, the word "Hello" is written U+0048 U+0065 U+006C U+006C U+006F.

URL

An acronym for "Uniform Resource Locator", a string that describes the address of documents and other resources on the Internet. Defined by the IETF in RFC 2396, a URL is comprised of two parts separated by a colon (":"). The first part of the address indicates what protocol to use, e.g., http, ftp, etc., and the second part specifies the IP address or the domain name where the resource is located.

UTF-8

UTF-8 -bit Unicode Transformation Format is a system for encoding Unicode so each character can be transmitted using 8-bit numerical values. This is commonly used as 8-bit data transmission is prevalent on the Internet.

Back to top

This file last modified 07-May-2008

© 2008 Internet Corporation For Assigned Names and Numbers