Site Map    |    Site Index    | 
Quick Links:
Search:

Internet Corporation for Assigned Names and Numbers

^ Home

> Meetings

Calendar of Internet Community Events

33rd International Public ICANN Meeting - 2 - 7 November - Cairo, Egypt

 

32nd International Public ICANN Meeting - 22 - 27 June 2008 - Paris, France

 

Meeting Participation Site

 

ICANN Dashboard - Performance metrics at a glance

Meeting Fellowships

Past ICANN Meetings

Public Participation Site

 

ICANN Meetings in Kuala Lumpur

Workshop: Internationalized Domain Name

Wednesday, 21 July 2004

Note: The following is the output of the real-time captioning taken during the Internationalized Domain Name Workshop held on 21 July, 2004 in Kuala Lumpur, Malaysia. Although the captioning output is largely accurate, in some cases it is incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the session, but should not be treated as an authoritative record.

>>VINT CERF: GOOD MORNING, LADIES AND GENTLEMEN.
MY NAME IS VINT CERF. I AM CHAIRMAN OF ICANN. I'D LIKE TO ASK YOU TO TAKE YOUR SEATS SO WE CAN GET THIS WORKSHOP UNDERWAY. AS I THINK MANY OF YOU KNOW, IT'S FILLED WITH A CONSIDERABLE AMOUNT OF MATERIAL.

I WANT TO TAKE THIS OPPORTUNITY, FIRST OF ALL, TO THANK JOHN KLENSIN AND TINA DAM AT ICANN FOR ORGANIZING THIS WORKSHOP.

THE WHOLE SUBJECT OF INTERNATIONALIZED DOMAIN NAMES IS OF INTENSE INTEREST TO WHOEVER DEALS ON A REGULAR BASIS WITH NON-LATIN CHARACTER SETS. HOWEVER, IT'S A VERY COMPLEX AREA. THE ORIGINAL DESIGN OF INTERNET DID NOT CONTEMPLATE INCORPORATING THESE ALPHABETS IN THE DOMAIN NAME SYSTEM. AND SO I THINK THAT YOU'LL SEE IN THE COURSE OF THIS WORKSHOP HOW MUCH EFFORT HAS ALREADY GONE INTO ACCOMMODATING THIS EXTENSION AND ALSO HOW DIFFICULT AND COMPLICATED IT'S TURNING OUT TO BE. SO I HOPE AT THE END OF THE THAT YOU WILL COME AWAY WITH THE SAME APPRECIATION FOR THAT THAT MANY OF US HAVE.

I NEED TO MAKE AN APOLOGY TO YOU THAT SOME OF THE MEMBERS OF THE BOARD OF DIRECTORS ARE GOING TO HAVE TO DEPART FOR A SHORT PERIOD OF TIME AT 9:30. WE HAVE A MATTER THAT HAS TO BE TAKEN CARE OF. AND BECAUSE OF TIME ZONE PROBLEMS, SINCE WE'RE 12 HOURS AWAY FROM THE EAST COAST OF THE U.S. AND 15 HOURS AWAY FROM THE WEST COAST, WE WERE UNABLE TO SCHEDULE THIS PARTICULAR MATTER AT A TIME THAT WOULDN'T INTERFERE WITH OUR PARTICIPATION IN THE WORKSHOP. SO SOME OF US WILL DEPART FOR A LITTLE WHILE AND COME BACK. I DON'T WANT ANY OF YOU TO GET THE IMPRESSION THAT WE ARE LEAVING BECAUSE WE ARE NOT INTERESTED. WE ARE LEAVING SIMPLY BECAUSE OF THIS TIME ZONE ISSUE. BUT WE WILL BE BACK.

AND WITH THAT, I THINK I WILL TURN THIS OVER TO THE CHAIRMAN OF TODAY'S WORKSHOP, PAUL TWOMEY.

>>PAUL TWOMEY: THANK YOU, VINT.

THE CHAIR ARRANGEMENTS FOR TODAY WILL BE MYSELF AND SHARIL TARMIZI FROM THE (INAUDIBLE) AND SHARIL WILL MAKE SOME COMMENTS. UNFORTUNATELY, THE CHAIR IS CAUGHT IN THE SAME DIFFICULTY THAT VINT POINTED TO. AND, SECONDLY, I WOULD JUST LIKE TO SAY HOW IMPORTANT WE THINK TODAY'S SYMPOSIUM IS, AND, YOU KNOW, LOOKING AT THIS, WE WILL BE VERY INTERESTED TO SEE THE DISCUSSION AND THE INVOLVEMENT FROM THE MANY STAKEHOLDERS WHO HAVE INTEREST IN THIS ARENA. AND REALLY LOOKING FORWARD TO QUITE A LONG DISCUSSION.

THE PROGRAM TODAY, AND ESPECIALLY INCLUDING THIS EVENING, HAS BEEN PROGRAMMED TO ALLOW YOU TO ENSURE THAT YOU CAN INFORMALLY MEET AND DISCUSS AND CONTINUE TO HOLD DISCUSSIONS WITH PEOPLE ABOUT THE ISSUES THAT HAVE EMERGED DURING THE DAY. SO WE'VE ACTUALLY GOT THIS ROOM FREE THIS EVENING. WE'VE GOT -- I THINK WE'VE GOT SOME REFRESHMENT AVAILABLE AFTER THE SYMPOSIUM. SO, PLEASE, IF THERE ARE NOT OPPORTUNITIES TO ASK THE QUESTIONS DURING THE SYMPOSIUM, HAVE THEM WRITTEN DOWN, HAVE NOTES, KNOW WHO YOU ARE GOING TO TARGET. BECAUSE WHEN WE FINISH, YOU WILL HAVE A CHANCE TO CONTINUE THE DISCUSSION AT THE END, WHICH I THINK WILL BE VERY VALUABLE.

>>SHARIL TARMIZI: THANK YOU, PAUL.

SELAMAT DATANG.

GOOD MORNING, SELAGAT PAMI AS THEY SAY HERE IN MALAYSIA. I AM HONORED TO BE GIVEN THE OPPORTUNITY TO CO-CHAIR THIS SESSION, TOGETHER WITH PAUL TWOMEY.

IDN IS AN ISSUE OF GREAT INTEREST IN THIS REGION AND THIS PARTICULAR PART OF THE WORLD SIMPLY BECAUSE IT IS IN THIS PART OF THE WORLD THAT NON-ASCII SCRIPTS ARE PREVALENT AND WIDELY USED. IN MANY CASES, THE LACK OF IDN IN THE PAST HAS ACTUALLY BEEN ONE OF THE MAJOR CONTRIBUTORS TO THE ISSUE OF DIGITAL DIVIDE, BECAUSE PEOPLE HAVE TO KNOW THE ENGLISH ALPHABETS TO ACTUALLY BE ABLE TO ACCESS THE INTERNET.

I HAVE PERSONALLY BEEN INVOLVED IN THIS AREA SINCE THE VERY BEGINNING, AND I AM GLAD TO SAY THAT WE HAVE COME TO A STAGE WHERE IT'S SUFFICIENTLY MATURE TO INTRODUCE THIS INTO THE DNS SYSTEM. BUT CHALLENGES CONTINUE TO EXIST SIMPLY BECAUSE THIS IS JUST THE WAY IT WORKS.

SO THANK YOU, PAUL. BACK TO YOU.

>>PAUL TWOMEY: WELL, OUR FIRST PRESENTATION THIS MORNING IS LED OFF BY JOHN KLENSIN, AND A PRESENTATION WHICH WE'RE VERY GLAD THAT ISOC HAS HELPED PUT TOGETHER. SO I WILL ASK JOHN TO COME TO THE PODIUM, AND ALSO, I THINK ALSO RECOGNITION SOME FOR THE REST OF THE PLANNING COMMITTEE, WHICH WE WOULD LIKE TO THANK FOR ALL THEIR EFFORTS.

THIS SYMPOSIUM IS VERY MUCH AN INITIATIVE OF THE CONSTITUENCIES OF ICANN, AND HAS IN ITS ORIGINS REQUESTS FROM VARIOUS PARTS OF ICANN'S CONSTITUENCIES THAT IN KL THERE COULD BE DISCUSSION ABOUT IDNS.

SEEING THAT VARIOUS CONSTITUENCIES ALL ASK FOR THE SAME THING, THE CONSTITUENCIES DECIDED IT WAS BEST TO HAVE ONE SINGLE DAY DEDICATED TO THE TASK.

AND THE PLANNING COMMITTEE WHICH COMES FROM THE VARIOUS CONSTITUENCIES HAS HELPED COORDINATE THAT. AND I WISH TO EXPRESS MY THANKS FOR IT.

JOHN, ARE YOU READY?

>>JOHN KLENSIN: SORRY. WE WERE JUST VERIFYING THE LOCATION OF ONE OF THE SPEAKERS.

GOOD MORNING, EVERYONE. I'D LIKE TO START OUT BY THANKING EVERYONE WHO HAS PARTICIPATED IN PUTTING THIS TOGETHER, INCLUDING THE PLANNING COMMITTEE FOR THE OVERALL EFFORT, FROM CCNSO, HIRO HOTTA AND NAI-WEN HSU, BRUCE TONKIN FROM GNSO, HANG WEE FROM ALAC, CHRISTOPHER WILKINSON FROM THE GAC, AND ESPECIALLY TINA DAM, WHO HAS PUT UP WITH ALL OF THE REST OF US IN TRYING TO DRAW THIS TOGETHER.

AS PAUL MENTIONED, THE MORNING EFFORT IS A TUTORIAL PUT TOGETHER BY ISOC AND FOLLOWING UP ON A SIMILAR TUTORIAL THAT WE DID IN BARCELONA A FEW WEEKS AGO AT INET. WE LEARNED A LOT AT BARCELONA FROM WHAT WE COVERED AND WHAT WE DIDN'T COVER. AND I HOPE THIS EFFECTIVELY REFLECTS THAT AS WELL AS A LOT OF INPUT FROM THE ICANN COMMUNITY.

YOU'LL NOTICE THESE FIRST TWO SLIDES DON'T APPEAR TO BE ABOUT IDNS. BUT THEY ARE ABOUT IDNS. AND THEY'RE ABOUT A THEME WHICH I'LL BE COMING BACK TO SEVERAL TIMES DURING THE MORNING. AND THAT IS THAT WE HAVE A TREMENDOUS OPPORTUNITY HERE WITH THE IDN SITUATION AND WITH INTERNET INTERNATIONALIZATION IN GENERAL, BUT IT'S ALSO AN OPPORTUNITY TO CAUSE OURSELVES A GREAT MANY PROBLEMS.

WE'LL BE EXPLORING SOME OF THEM. AND THIS SLIDE MAY BEGIN THE EXPLORATION.

THIS SLIDE SHOWS THE NAME OF THIS TALK IN TWO LANGUAGES, ONE SCRIPT, MORE OR LESS, AND IT NAMES THE SPEAKERS. IT IS MY SUPPOSITION THAT NO ONE IN THIS ROOM CAN READ ALL OF THIS. IT IS MY SUPPOSITION THAT VERY FEW PEOPLE IN THIS ROOM WILL BE ABLE TO IDENTIFY ALL OF THE LANGUAGES AND SCRIPTS PRESENT ON THE SLIDE.

OUR OPPORTUNITY HERE, AND THE IMPORTANT ONE, IS TO MAKE THE INTERNET ACCESSIBLE FOR THOSE WHO USE THESE LANGUAGES AND SCRIPTS WHICH ARE NOT ENGLISH AND ASCII. OUR OTHER OPPORTUNITY IS TO CREATE AN INTERNET IN WHICH NONE OF US CAN COMMUNICATE EXCEPT WITHIN OUR OWN COMMUNITIES. AND WE NEED TO NOT TAKE ADVANTAGE OF THAT SECOND OPPORTUNITY.

THAT'S THE ENGLISH TRANSLATION OF THAT SLIDE. ITS IMPORTANT ASPECT IS THAT ALL OF THE NAMES ARE SPELLED WRONG EXCEPT MINE. BUT MOST OF YOU CAN SOMEHOW MANAGE TO READ MOST OF IT.

TO ANSWER THE QUESTIONS ABOUT SCRIPTS, I CANNOT PRONOUNCE THE THINGS ON THE LEFT-HAND SIDE OF THE SCREEN. SOME OF YOU CAN. THE FIRST LINE IS IN MALAY IN THE JAWBI SCRIPT WHICH IS ARABIC BUT NOT USED FOR WRITING ARABIC PRECISELY. THE SECOND, IF IT'S SURVIVED MY THIRD BOUT WITH POWERPOINT IS ARABIC. I'M NOT CERTAIN WHAT THE THIRD LINE IS. THE FOURTH LINE IS A TRANSLITERATION OF TIN WEE TAN INTO TAMIL. THE FOURTH ONE IS JAMES SENG'S REAL NAME WRITTEN IN SIMPLIFIED CHINESE. AND I CAN'T PRONOUNCE IT. THE THIRD LINE IS IN CYRILLIC SCRIPT BUT NOT IN RUSSIAN, IT'S IN BULGARIAN. AND THEY DON'T USE PRECISELY THE SAME CHARACTERS THE RUSSIANS USE. THE FOURTH LINE IS PATRIK FALTSTROM'S NAME. AND I CAN ALMOST PRONOUNCE THAT. AND THE FINAL LINE IS KENNY HUANG'S NAME AND NOT ONLY CAN I NOT PRONOUNCE IT BUT I HAVEN'T YET BEEN ABLE TO FIGURE OUT WHAT IT IS.

WHAT WE'RE GOING TO TRY TO TALK ABOUT THIS MORNING IS INTERNATIONALIZATION. NOT IDN SPECIFICALLY, BUT THE QUESTION ABOUT HOW WE MAKE THE NETWORK USABLE TO PEOPLE IN DIFFERENT AREAS SPEAKING DIFFERENT LANGUAGES. IDNS ARE POSSIBLY A PART OF THAT SOLUTION, PROBABLY ARE. THEY MAY OR MAY NOT BE KEY TO THAT SOLUTION. BUT IF WE HAVE IDNS AND NOTHING ELSE, WE HAVE NOTHING.

I PROMISE TO START BY ATTACKING A MYTH. THE MYTH IS THIS NETWORK WAS DESIGNED ENTIRELY BY A BUNCH OF GEEK ENGINEERS WHO SPOKE ONLY ENGLISH, WEREN'T WORRIED ABOUT THE REST OF THE WORLD, AND HAD NO INTEREST AT ALL IN WHETHER OR NOT THE NETWORK WAS ACTUALLY USABLE FOR ANYTHING, ONLY TO FIGURE OUT IF THEY COULD MAKE THE TECHNOLOGY WORK.

IT'S NOT TRUE. THERE WERE DISCUSSIONS ABOUT THE USABILITY AND IMPLICATIONS OF A NETWORK LIKE THIS BEFORE THE FIRST PACKET WENT DOWN THE WIRES. YOU'VE GOT PEOPLE IN THE ROOM WHO WERE ON THE PACKET SIDE, PEOPLE IN THE ROOM WHO WERE ON THAT DISCUSSION SIDE. ISSUES ABOUT INTERNATIONALIZATION AND NETWORK CAME UP IN 1970. NOT IN THE PACKETS, BUT IN DISCUSSIONS ABOUT HOW TO DESIGN THE APPLICATIONS. WE, OF COURSE, FAILED MISERABLY.

THIS TUTORIAL IS NOT ABOUT ANSWERS. THIS AFTERNOON, YOU START WORKING ON ANSWERS. THE TUTORIAL IS ABOUT THE QUESTIONS AND THE ISSUES AND THE DECISIONS.

AS I SAID EARLIER, PART OF WHAT I'M GOING TO KEEP COMING BACK TO IS THAT MANY OF THE EASY ANSWERS FOR INTERNATIONALIZATION ARE REALLY GOOD IF YOU'VE GOT AN ISOLATED, HOMOGENEOUS POPULATION THAT KNOWS BY VIRTUE OF THE FACT THAT THEY'RE TALKING WITH EACH OTHER THAT THEY'RE ALL SPEAKING THE SAME LANGUAGE, USING THE SAME SCRIPTS, AND USING THE SAME CODINGS. THAT'S A VERY EASY PROBLEM.

THE ABILITY TO MAKE THAT WORK DOES NOT IMPLY A SOLUTION TO THE INTERNATIONALIZATION PROBLEM, BECAUSE WHAT THE EASY WAYS OF MAKING THAT WORK DO IS TO LET THOSE PEOPLE COMMUNICATE WITH EACH OTHER. THEY DON'T COMMUNICATE WITH ANYONE ELSE AND NO ONE ELSE COMMUNICATES WITH THEM.

ALL THE GLOBAL SOLUTIONS INVOLVE POLICY TRADEOFFS IN WHICH THOSE TWO SETS OF ISSUES ARE BALANCED AGAINST EACH OTHER IN AN INTELLIGENT WAY. I DON'T HAVE THE ANSWERS. WE MAY START WORKING ON THEM THIS AFTERNOON.

SO WE'RE GOING TO TALK ABOUT EXAMINING THE IDNS, BUT TALK ABOUT THE IDNS IN THE GENERAL CONTEXT OF INTERNATIONALIZATION AND LOCALIZATION. AND THE GENERAL CONTEXT OF NAVIGATION ON THE INTERNET. HOW DO WE ACTUALLY FIND SOMETHING AND ACCESS IT. WE'RE GOING TO TALK A LITTLE BIT ABOUT THE PHYSICS OF THE ENVIRONMENT, THINGS THAT YOU CAN'T DO AND STILL HAVE THE INTERNET WORK PROPERLY IN A GLOBAL ENVIRONMENT.

AND WE'RE GOING TO START IDENTIFYING THE POLICY ISSUES. MY GUESS IS THAT WE'RE GOING TO HAVE POLICY ISSUES IDENTIFIED THIS AFTERNOON THAT WON'T COME UP THIS MORNING. AND POLICY ISSUES IDENTIFIED OVER THE NEXT THREE YEARS THAT WE DON'T EVEN UNDERSTAND TODAY.

INTERNATIONALIZED DOMAIN NAMES THEMSELVES AREN'T THE PROBLEM. AS I SAID EARLIER, THEY MAY BE PART OF THE SOLUTION. THE PROBLEM, AS I SEE IT, IS HOW WE MAKE THE INTERNET FULLY INTERNATIONAL AND FULLY INTERNATIONAL WITH AS LITTLE ENGLISH BIAS AS POSSIBLE.

OUR GOALS SHOULD BE TO GET TO PINT WHERE ENGLISH IS JUST ANOTHER LANGUAGE ON THE INTERNET RATHER THAN THE ONE TO WHICH EVERYBODY TURNS IF NOTHING ELSE WORKS, OR RATHER THAN THE ONE WHICH DOMINATES WITH EVERYTHING ELSE BEING SOME STRANGE, FUNNY ADD-ON.

THE TERM INTERNATIONALIZED DOMAIN NAME OR IDN IS USED IN MANY WAYS. STRICTLY SPEAKING, WE'RE TALKING ABOUT DOMAIN NAME LABELS THAT REPRESENT NAMES THAT CONTAIN CHARACTERS WHICH AREN'T IN THE VERY LIMITED ASCII SUBSET USED FOR HOST NAMES. THE CURRENT STANDARD ONLY ENTERS HOST NAME TYPE THINGS INTO THE DNS. IT RELIES ON APPLICATIONS AND ON THE CLIENT MACHINE TO GET THE IDNS INTO A CODING WHICH CAN BE USED AND TRANSLATED IN AND OUT.

THAT CODING HAS A NUMBER OF STRENGTHS. ITS MOST IMPORTANT ONE IS IT DOESN'T WRECK THE INTERNET. IT'S GOT SOME WEAKNESSES, SOME OF WHICH HAVE TO DO WITH SOME CHARACTERS GETTING MAPPED ONTO OTHER ONES IN WAYS THAT DON'T REVERSE. WE'LL BE TALKING ABOUT THAT LATER.

BUT WE SOMETIMES HAVE IDN TALKING ABOUT A FULLY QUALIFIED DOMAIN NAME OR OTHER KINDS OF THINGS WHICH ARE NOT THEMSELVES IN THE DOMAIN NAME SYSTEM. IF YOU ASK THE TYPICAL USER OF THE NETWORK WHETHER THEY WANT INTERNATIONALIZATION AND YOU EXPLAIN TO THEM THAT INTERNATIONALIZATION HAS TO DO WITH THE ABILITY TO WRITE IN ANY SCRIPT AND COMMUNICATE IN ANY LANGUAGE, IF THEY FULLY UNDERSTAND THE QUESTION, THE ANSWER WILL CERTAINLY BE "NO." IF I DON'T READ OR SPEAK ARABIC, MY ABILITY TO TRANSMIT ARABIC MAKES VERY LITTLE DIFFERENCE TO ME PERSONALLY. IT'S VERY IMPORTANT TO HAVE THE CAPABILITY ON THE NETWORK, BUT I CAN'T USE IT. AND THE REASON WHY I CAN'T USE IT DOESN'T HAVE TO DO WITH THE TECHNOLOGY; IT HAS TO DO WITH THE FACT THAT I HAVE NEVER SUCCEEDED IN LEARNING ARABIC.

SO WHAT USERS WANT IS SYSTEMS THAT ARE LOCALIZED, THAT ARE ADAPTED TO THEIR PARTICULAR LANGUAGE, TO THEIR WRITING SYSTEM, TO THE CHARACTER CODES THEY USE, IN SOME CASES, THEIR LOCATION AND THEIR INTERESTS. AND FROM THAT STANDPOINT, INTERNATIONALIZATION IS A MEANS TO GETTING GOOD LOCALIZATION, WHILE PRESERVING INTERNET OPERABILITY. SO WE END UP WITH GLOBAL INTEROPERABILITY, BUT VERY GOOD LOCALIZATION.

AS I SAID EARLIER, THERE ARE LOTS AND LOTS OF EASY SOLUTIONS TO THIS PROBLEM. WE CAN PUT UP TRICK DNS SERVERS WHICH DON'T QUITE CONFORM TO THE STANDARDS. WE CAN TRY TO ROUTE DNS QUERIES UNDER SOME CIRCUMSTANCES TO THINGS WHICH ARE NOT DNS SYSTEMS, AND DO OTHER KINDS OF LOOKUPS. WE CAN JUST PUT STRINGS IN THE DNS WHICH WE KNOW REPRESENT OUR CODES SO WE HOPE THAT NOBODY WHO IS LOOKING FOR SOMETHING IN ANOTHER LANGUAGE EVER ENCOUNTERS THOSE THINGS BECAUSE THEY WILL GET VERY CONFUSED. WE CAN DISCOVER THAT WE HAVE INTEROPERABILITY AMONG TWO PEOPLE WHO SPEAK THE SAME LANGUAGE, BOTH OF WHOM KNOW THAT THAT LANGUAGE IS WHAT'S COMING DOWN THE WIRE AND ANNOUNCE THAT WE HAVE SOLVED THE PROBLEM.

WELL, WE'VE SOLVED THE PROBLEM WITH THOSE TWO PEOPLE COMMUNICATING WITH EACH OTHER AND NO ONE ELSE. SO WE NEED TO LOOK FOR LOCAL SOLUTIONS AND GLOBAL INTEROPERABILITY, NOT THESE TRICKS.

WE NEED TO LOOK FOR FLEXIBILITY AND SAFETY.

AND INTERNATIONALIZATION SOLUTION WHICH STOPS THE INTERNET FROM WORKING IN A GLOBAL WAY IS NOT, I HOPE, WHAT ANY OF US ARE LOOKING FOR.

WE'RE STUCK WITH UNICODE, AND UNICODE HAS SOME PROBLEMS, WHICH WE'LL TALK ABOUT LATER IN THE MORNING. BUT WE NEED TO WORK AROUND THOSE PROBLEMS.

AND BE GLAD THAT WE FINALLY GOT A SINGLE CHARACTER SET WHICH REPRESENTS ENOUGH OF THE CHARACTERS IN USE IN THE WORLD THAT WE CAN ACTUALLY DO THIS INTERNATIONALIZATION JOB. THAT'S WHERE THE WORK IN THE EARLY 1970S GOT HUNG UP.

AND THE FIRST TWO OF THESE ISSUES, THE LOCAL SOLUTIONS AND GLOBAL INTEROPERABILITY IN THE BALANCE, AND THE FLEXIBILITY AND SAFETY AND NOT DOING SOMETHING WHICH CAUSES THE INTERNET TO WORK LESS WELL IMPACT ALMOST EVERY DECISION ABOUT INTERNATIONALIZATION. IT'S A TENSION BETWEEN THE NOTION THAT EVERY CULTURE OR COUNTRY OR COMPANY OR PERSON MAKES ITS OWN DECISIONS OR THEIR OWN DECISIONS INDEPENDENTLY AND DOES THINGS THEIR WAY, VERSUS THE MAJOR STRENGTH OF THE NETWORK AND THE ABILITY TO SMOOTHLY INTEROPERATE GLOBALLY, PERMITTING NEXT GENERATION OF INNOVATIONS AND PERMITTING GLOBAL COMMUNICATION. WE CAN PROBABLY GET BOTH TOGETHER, BUT IT IS HARD WORK.

THE END-TO-END PRINCIPLE OF THE INTERNET PERMITS MORE INDEPENDENT DECISION-MAKING THAN OTHER NETWORK TECHNOLOGIES WHICH REQUIRE THINGS TO BE DECIDED AND MANAGED MORE CENTRALLY.

IF SOMEBODY COMES ALONG AND TELLS YOU THE NEXT-GENERATION NETWORK WILL WORK MUCH, MUCH BETTER THAN THE INTERNET BECAUSE IT CENTRALIZES THE CONTROL AND MAKES EVERYTHING WORK SMOOTHLY BECAUSE IT ALL GOES THROUGH SOME APPROVED PROVIDERS, DON'T EXPECT THAT TO DELIVER INTERNATIONALIZATION TO YOU IF YOU ALSO CARE ABOUT YOUR OPERATION.

WE HAVE HAD PROTOCOLS RUNNING ON OTHER NETWORKS AND INDEED ON THE INTERNET IN WHICH THE STANDING JOKE WAS THAT IF YOU RECEIVED A MESSAGE, AN E-MAIL MESSAGE, FOR EXAMPLE, THE WAY IN WHICH YOU WOULD RESPOND TO THAT MESSAGE WAS TO PICK UP THE PHONE AND CALL THE CALLER, BECAUSE THE ODDS THAT YOU COULD GET THE ANSWER BACK USING THE SAME COMMUNICATIONS CHANNEL WERE VERY LOW BECAUSE OF THESE INTEROPERABILITY AND CENTRALITY AND LOCAL OPTIONS AND LOCAL PROFILE PROBLEMS.

IN THIS AREA, A LOT OF THE QUESTIONS HAVE THE ANSWER, BE CAREFUL WHAT YOU WISH FOR. BUT AS I SAID, IT'S HARD, AND THE SIMPLE AND OBVIOUS SOLUTIONS COULD BE A GLOBAL DISASTER. WE REALLY HAVE TO ASK THESE QUESTIONS DEEPLY.

GETTING BOTH GLOBAL INTEROPERABILITY AND GOOD LOCALIZATION REQUIRES THAT WE WORK TOGETHER AND DO SO IN GOOD FAITH, AND WITH DUE RESPECT FOR EACH OTHER AND FOR THE MANY LINGUISTIC AND CULTURAL DIFFERENCE THESE PROBLEMS INVOLVE. THAT'S WHY THOSE TWO SLIDES WERE UP THERE AT THE BEGINNING. IF WE GET INTO A MENTALITY IN WHICH EACH OF US SAYS, "MY LANGUAGE IS IMPORTANT AND I DON'T CARE IF ANYTHING ELSE WORKS," THEN WE WILL END UP WITH AN INTERNET THAT DOESN'T WORK.

FLEXIBILITY AND SAFETY ISSUE IS A TRADEOFF BETWEEN THE MAXIMUM FREEDOM TO IMPLEMENT ANY PROTOCOL ONE WANTS IN ANY WAY ONE WANTS, AND STABILITY AND SECURITY.

THIS NETWORK WORKS BECAUSE WE ARE ALL WORKING WITH EACH OTHER. UNICODE IS A CHARACTER SET WHICH IS DESIGNED TO INCLUDE ALL OF THE CHARACTERS IN USE IN LANGUAGES IN THE -- IN WRITING LANGUAGES IN THE WORLD TODAY. THEY WERE A COMMITTEE. LIKE A GOOD COMMITTEE, THEY MADE DECISIONS ABOUT TRADEOFFS.

SEVERAL OF THOSE TRADEOFFS MADE UNICODE FAIRLY POOR FOR THE DNS APPLICATIONS WHICH WE'RE TRYING TO USE IT. AND FOR UNIFORM RESOURCE IDENTIFIER APPLICATIONS. AND FOR OTHER THINGS, INCLUDING CONTENT.

EVEN WHERE UNICODE IS POSSIBLY THE OPTIMAL SOLUTION, IT MAY BE INCONSISTENT WITH CODING METHODS USED IN CERTAIN COUNTRIES AND INCONSISTENT -- AND IS INCONSISTENT INTERNALLY. DIFFERENT KINDS OF DECISIONS WERE MADE APPLYING DIFFERENT RULES IN DIFFERENT PARTS OF THE CODE SPACE.

THE MOST COMMONLY KNOWN EXAMPLE IS THEY CHOSE TO TAKE THE LANGUAGES AND SCRIPTS WHICH USE CHINESE CHARACTERS AND PUT THEM TOGETHER, AND THEN THEY CHOSE TO TAKE THE LANGUAGES AND SCRIPTS THAT USE LATIN CHARACTERS OR GREEK CHARACTERS OR CYRILLIC CHARACTERS, WHICH OVERLAP A GREAT DEAL, AND TAKE THEM APART. SO IF YOU HAVE A RULE WHICH WORKS TOGETHER, IT DOESN'T WORK WELL IN THE APART CODE OF THE SPACE. AND VICE VERSA.

WE'VE GOT SOLUTIONS WHICH WORK AROUND THOSE PROBLEMS, BUT WE ALL NEED TO UNDERSTAND THEY ARE COMPROMISES. UNFORTUNATELY, IF WE START LOOKING FOR ALTERNATIVES TO UNICODE, THERE ARE TWO OF THEM. ONE OF THEM DOESN'T EXIST, WHICH IS A VERY BAD ALTERNATIVE. THERE ARE NO OTHER COMPLETE UNIFORM UNIVERSAL CHARACTER SETS. AND THE OTHER SET OF ALTERNATIVES IS MUCH WORSE.

WHEN WE TALK ABOUT A CHARACTER SET WHICH IS CODED FOR INFORMATION PROCESSING USE, WHAT WE'RE TALKING ABOUT IS TAKING FAIRLY ABSTRACT CHARACTERS, MAYBE NOT THE SAME CHARACTERS WE'RE THINKING ABOUT WHEN YOU LOOK AT A PRINTED PAGE OR TALK ABOUT IN OUR NORMAL LANGUAGES, AND WE TAKE THOSE ABSTRACT CHARACTERS AND WE ASSIGN THEM CODE POINTS.

WHEN WE GET THROUGH ASSIGNING CODE POINTS, AS FAR AS THE COMPUTER IS CONCERNED, A CHARACTER IS NOTHING BUT A STRING OF BITS. AND THAT'S IMPORTANT, BECAUSE IF WE DON'T KNOW HOW TO INTERPRET THE STRING OF BITS BACK INTO CHARACTERS, WE DON'T HAVE ANY INFORMATION.

ESSENTIALLY, WHAT THESE PEOPLE IN THE CHARACTER CODING BUSINESS ARE DOING IS TAKING THE CHARACTERS THEY'RE INTERESTED IN, PUTTING THEM IN SOME ORDER, AND THEN NUMBERING THEM. IF THEY'RE IN THE KIND OF UNIVERSAL CHARACTER SET BUSINESS OF UNICODE FOLKS WERE IN, THEY'RE PUTTING THEM INTO GROUPS WHICH THEY CALL SCRIPTS OR SOMETHING ELSE. AND THEN THEY'RE ORDERING THEM AND NUMBERING THEM.

THE ACTUAL FORM OF THE CHARACTERS ARE RARELY STANDARDIZED. IF YOU WERE TO LOOK AT THE UNICODE STANDARD, YOU WOULD SEE A LOT OF WORDS DESCRIBING WHAT A CHARACTER IS. AND A LOT OF DISCLAIMERS THAT THE GLYPHS WHICH THEY PUT IN THE TABLES ARE JUST REPRESENTATIVE. WE DON'T TRY TO -- WE DON'T STANDARDIZE GLYPHS AND WE CERTAINLY DON'T STANDARDIZE FONTS. THAT'S A PROBLEM BECAUSE IF THERE'S A CHARACTER SET THAT I'M USING AND READING EVERY DAY, I CAN UNDERSTAND WHAT IS FONT-VARIATION AND WHAT IS A DIFFERENT CHARACTER. IF THERE'S A CHARACTER SET THAT I DON'T UNDERSTAND AND DON'T KNOW HOW TO READ OR HOW TO USE, I CAN'T TELL THE DIFFERENCE BETWEEN DIFFERENT CHARACTER AND DIFFERENT FONTS, DIFFERENT CHARACTER AND DIFFERENT TYPING LAYOUT, AND IN SOME CHARACTER SETS, I CAN'T TELL WHERE ONE CHARACTER ENDS AND ANOTHER CHARACTER BEGINS. YOU NEED TO UNDERSTAND THE CHARACTERS IN ORDER TO DO THAT. THE COMPUTER MAY OR MAY NOT BE COMPLETELY CONVENIENT FOR IT.

WE TALK A LOT ABOUT SCRIPTS AND LANGUAGES. IN THE PECULIAR VOCABULARY OF THESE CODING ENVIRONMENTS, A SCRIPT IS A COLLECTION OF RELATED CHARACTERS. IT'S VERY COMMON FOR SEVERAL LANGUAGES TO SHARE MOST BUT POSSIBLY NOT ALL CHARACTERS OF THE SAME SCRIPT. THERE WERE SOME COMES IN THAT OPENING SLIDE.

AND WE'VE CREATED A LOT OF CONFUSION FOR OURSELVES BY VERY OFTEN USING THE SAME NAME FOR A SCRIPT AS WE USE FOR ONE OF THE LANGUAGES WHICH USES IT. SO WE'RE LUCKY THAT WE HAVE A NAME LIKE CYRILLIC, WHICH IS THE NAME OF A SCRIPT.

THERE IS NO SUCH LANGUAGE. AND IT'S USED FOR RUSSIAN AND UKRAINIAN AND BULGARIAN AND A NUMBER OF OTHER THINGS. WE -- BUT THE MORE COMMON CASES, WE HAVE A SCRIPT WE CALLED ARABIC, WHICH IS USED FOR A NUMBER OF LANGUAGES, INCLUDING ARABIC. AND MOST OF THOSE LANGUAGES WHICH USE THE ARABIC SCRIPT DO NOT USE EXACTLY THE SAME CHARACTERS FROM IT THAT THE ARABIC LANGUAGE USES.

WE HAVE TO BE VERY, VERY CAREFUL ABOUT WHETHER WE ARE TALKING ABOUT LANGUAGES OR SCRIPTS. IF WE DEFINE A SCRIPT ONLY IN TERMS OF ITS USE IN A PARTICULAR LANGUAGE, WE LOCK THOSE OTHER LANGUAGES OUT. UNICODE CONSORTIUM GIVES NAMES FOR SCRIPTS AND LANGUAGE BINDINGS, BUT THEIR SCRIPT NAMES ARE -- WELL, AS I SAID IN THE SLIDE, THE PRECISION IS VERY LOW. IF YOU RELY ON THOSE SCRIPT NAMES TO START TALKING ABOUT LANGUAGES OR POSSIBLY EVEN TO START TALKING ABOUT SCRIPTS, YOU MAY GET YOURSELF VERY CONFUSED, OR EVERYBODY ELSE VERY CONFUSED.

MOST OF THE LANGUAGES IN THE WORLD ARE SPOKEN BY AT LEAST SOMEONE IN MANY COUNTRIES, SOMETIMES TRAVELERS, SOMETIMES EMIGRES. PEOPLE MIGRATE AND TAKES LANGUAGES WITH THEM. LANGUAGES EVOLVE DIFFERENTLY OVER TIME IN DIFFERENT PLACES. WRITING SYSTEMS EVOLVE EVEN MORE QUICKLY THAN LANGUAGES DO. AND OVER ENOUGH TIME, A GIVEN LANGUAGE IN A GIVEN COUNTRY MAY BE DIFFERENT THAN THE SAME LANGUAGE OR SUPPOSEDLY SAME LANGUAGE IN SOME OTHER COUNTRY, WHETHER IT'S THE SAME LANGUAGE OR DIFFERENT LANGUAGE IS A MATTER OF CONVENTION, NOT SCIENCE. AND CERTAINLY NOT OF UNICODE.

THE DNS ITSELF DOESN'T KNOW ANYTHING ABOUT LANGUAGES, IT DOESN'T KNOW ANYTHING ABOUT SCRIPTS, IT KNOWS ABOUT CHARACTERS ONE AT A TIME. BUT AS I SAID EARLIER, IF WE SUCCEED IN SOLVING THE DOMAIN NAME INTERNATIONALIZATION PROBLEM AND WE DON'T SOLVE THE CONTENT PROBLEM, WE DON'T HAVE MUCH TO DO. FINDING SOMETHING WHICH I CAN'T READ IS ALMOST AS USELESS AS NOT BEING ABLE TO FIND IT.

BUT THE WAY IN WHICH WE SOLVED THE CONTENT PROBLEM OVER THE YEARS IS WITH WHAT WE CALL TAGGING. WE'VE INVENTED SYSTEMS OVER THE YEARS IN WHICH EITHER I CALL SOMEBODY UP ON THE TELEPHONE AND SAY I'M ABOUT TO SEND YOU A FILE AND WHEN YOU GET THAT FILE THE BITS ARE IN THE KOI 8 ENCODING OF RUSSIAN CYRILLIC AND YOU NEED TO FIGURE OUT HOW TO DECODE THEM. AND THIS, AS THE MAIL SYSTEMS I REFERRED TO BEFORE, INVOLVE SPENDING A LOT OF TIME ON THE PHONE TO COMMUNICATE HOW TO MAKE THE COMPUTER SYSTEMS WORK. OR WE FIGURE OUT A WAY TO TRANSMIT THE INFORMATION TO THE RECIPIENT AS PART OF THE MESSAGE, OR ATTACH THE MESSAGE.

WHEN WE START MAKING THAT PROCESS SYSTEMATIC, IT'S WHAT WE TALK ABOUT AS TAGGING, WE ATTACH A TAG TO THE MESSAGE WHICH TELLS THE RECIPIENT WHAT IT IS. WE DISCOVERED LAST WEEK, IN TRYING TO PUT THE FINAL ISSUES OF THE SEMINAR TOGETHER, IS PLEASE TYPE YOUR NAME IN CHINESE CHARACTERS SO I CAN GET IT OUT OF THE MESSAGE AND PASTE IT INTO THE TITLE SLIDE, TURNS OUT TO BE A HARD QUESTION.

IT'S A HARD QUESTION BECAUSE WE HAVE SENDERS WHO HAVE CHARACTER SETS AND MAIL (INAUDIBLE) USER AGENTS ON THEIR MACHINES WHICH ASSUME THAT IF THE MESSAGE IS ANY MESSAGE AT ALL OR THE MESSAGE CONTAINS NON-ASCII CHARACTERS, THEN IT HAS TO BE IN A PARTICULAR CHARACTER SET. AND THEY WILL TAG AND LABEL THAT MESSAGE THAT WAY, AND IF THE RECIPIENT ISN'T PREPARED TO DEAL WITH THAT, THE RECIPIENTS MAIL USER AGENT MAY TURN TO TRASH.

WE HAVE MAIL USER AGENTS ON THE RECEIVING SIDE, OR MAIL SYSTEMS ON THE RECEIVING SIDE THAT IF A MESSAGE COMES IN THAT CONTAINS NON-ASCII CHARACTERS, NOT NON-ASCII IN THE SENSE THAT WE KNOW BUT NON-ASCII IN THE SENSE THAT THEY DON'T LOOK THAT WAY AT THE BIT LEVEL, THEY WILL ASSUME BASED ON THEIR GENERAL EXPERIENCE WHAT THOSE THINGS ARE CODED IN. AND IF I RECEIVE SOMETHING IN CHINESE AND MY SYSTEM DECIDES IT'S ACTUALLY IN EUROPEAN LATIN SCRIPT, WHAT I WILL SEE ON MY SCREEN, IF I'M LUCKY, IS A SERIES OF QUESTION MARKS. AND IF I'M NOT LUCKY I'LL SEE NONSENSE. SO THIS TURNS OUT NOT TO BE AN EASY QUESTION.

AND IT'S NOT AN EASY QUESTION NOT BECAUSE WE DON'T HAVE THE CODING SYSTEMS, DON'T UNDERSTAND HOW TO SEND THESE CHARACTERS. IT'S NOT AN EASY QUESTION BECAUSE THE APPLICATIONS HAVE TO WORK AND INTEROPERATE.

WITH UNICODE WE TOOK A MAJOR STEP FORWARD, AND IT WAS A MAJOR STEP FORWARD IN MANY OF THESE CONTENT INTERNATIONALIZATION ISSUES AND ALL OF THE DNS ONES.

WE DON'T HAVE A PLACE IN DNS TO PUT TAGS. SO IF I'M TRANSMITTING SOMETHING TO YOU THAT'S NOT ASCII IN THE DNS AND THE DNS NAME, I EITHER HAVE TO CALL YOU UP AND TELL YOU HOW TO READ IT, WHICH IS EXTREMELY INCONVENIENT FOR THE DNS, OR WE'VE GOT TO HAVE SOME SORT OF UNIVERSAL CHARACTER SET WHICH REPRESENTS EVERYTHING. UNICODE PROVIDES THE LATTER. AND AS I SAID, THERE ARE NO OTHER CHOICES.

THE CHOICE WHICH WAS MOST PREVALENT BEFORE UNICODE INVOLVED INCLUDING IN THE CHARACTER STRINGS WHICH WERE BEING SENT CODING FOR THE CHARACTER CODING WHICH ONE WANTED TO INTERPRET IT WITH. AND OUR EXPERIENCE WITH IT WAS VERY BAD, AND THE REASONS WHY ARE A WHOLE OTHER TOPIC. BUT IT DOESN'T WORK WELL.

UNLESS YOU KNOW IN ADVANCE THAT THE NUMBER OF SCRIPTS YOU'RE GOING TO BE SWITCHING BETWEEN IS EXACTLY TWO AND YOU KNOW WHAT THEY ARE. SO WHAT WE'RE TRYING TO DO HERE IS LET PEOPLE ACCESS INFORMATION AND THE INTERNET IN THE LANGUAGES AND SCRIPTS WHICH COME NATURALLY TO THEM. AND I WANT TO STRESS, AS I'VE SAID SEVERAL TIMES BEFORE IN TALKING ABOUT THIS ISSUE, THAT WE NEED TO BE CAREFUL TO UNDERSTAND WHAT THE PROBLEM IS WE'RE TRYING TO SOLVE. BECAUSE IF THE PROBLEM WE'RE TRYING TO SOLVE IS TO REGISTER AS MANY NAMES AS POSSIBLE NO MATTER HOW MUCH CONFUSION IT CREATES, AND IF WE CONSIDER THE CONFUSION AN ADVANTAGE BECAUSE WE MAKE MONEY RESOLVING THE CONFUSION, THEN WE HAVE AN ENTIRELY DIFFERENT SET OF GOALS THAN THOSE OF US WHO ARE TRYING TO MAKE INTERNATIONALIZATION WORK.

SO WE HAVE TO FIGURE OUT WHAT'S BROKEN AND NEEDS FIXING.
SO WHAT ISN'T WORKING ADEQUATELY TODAY?

WELL, WE HAVE PROBLEMS WITH INDIVIDUAL DOMAIN NAME LABELS. DNS MOSTLY WORKS IN TERMS OF FULLY QUALIFIED DOMAINS. IF WE HAVE ONE NAME AND ONE SCRIPT, A SECOND LABEL IN THE SECOND SCRIPT AND A THIRD LABEL IN A THIRD SCRIPT IT'S GOING TO MAKE PERFECTLY GOOD SENSE TO COMPUTERS BUT PEOPLE MAY NOT LIKE IT AT ALL.

IT'S REALLY HARD BECAUSE OF THE UNDERLYING -- THE WAY THE UNDERLYING PROTOCOLS WORK, AND BECAUSE SOME DECISIONS MADE VERY EARLY IN THE DEVELOPMENT OF THE NETWORK BY A LAZY APPLICATIONS DESIGNERS. I TAKE SOME OF THE BLAME. WE'VE TENDED TO EXPOSE, OVER THE YEARS, A LOT OF THE UNDERLYING WORKING OF PROTOCOL IN THE WIRES TO END USERS. WHEN THE WEB WAS BEING DESIGNED, THE ASSUMPTION WAS MADE THAT NO ONE WOULD EVER LOOK AT A URL MUCH LESS PUT IT ON THE SIDE OF A BUS.

IF YOU NEVER LOOK AT A URL OR PUT A URL ON THE SIDE AFTER BUS, AS AN END USER YOU MAY NOT CARE WHAT A URL LOOKS LIKE AND THE INTERNATIONALIZATION PROBLEM CHANGES COMPLETELY. BUT IF YOU DECIDE YOU HAVE TO LOOK AT THESE DOMAIN NAMES AND END USERS HAVE TO LOOK AT THESE DOMAIN NAMES AND URLS, AND YOU DECIDE THAT THE DOCTRINE IS NO ASCII CHARACTERS BECAUSE MY POPULATION DOESN'T READ THEM OR UNDERSTAND THEM, THEN WE END UP WITH ALL SORTS OF INTERESTING PROBLEMS.

THE DNS REQUIRES THAT LABELS IN A DOMAIN NAME BE SEPARATED BY SOMETHING, AND THAT SOMETHING IS, BY CONVENTION, A PERIOD. SO IF WE HAVE THE STRING ON THE SECOND LINE HERE, NONE OF THOSE CHARACTERS ARE IN ASCII. IT IS A CLASSIC EXAMPLE OF THIS DIFFERENT LABEL, DIFFERENT CHARACTER PROBLEM. BUT THE SEPARATORS WHICH TELL US WHICH LABELS ARE WHICH ARE ASCII PERIODS.

IF YOU THINK YOU HAVE TO WRITE DOWN THE STRING HTTP, THE PROTOCOL WHICH USES THAT STRING RECOGNIZES IT ONLY AS HTTP. YOU CAN TRANSLATE IT INTO SOME OTHER LANGUAGE OR CHARACTER SET, BUT AT THAT POINT YOU'RE NOT USING THE URL. YOU'RE USING A TRANSLATION. IF YOU DECIDE TO TAKE ADVANTAGE OF THAT TRANSLATION CAPABILITY, YOU COULD DO ALL KINDS OF OTHER THINGS, WHICH WE MOSTLY WILL NOT TALK ABOUT TODAY. BUT THEY MAY BE MORE INTERESTING THAN WHAT WE ARE TALKING ABOUT.

AND URIS CONTAIN ALL SORTS OF SPECIAL CHARACTERS WHICH MEAN THINGS AND THEY ARE ALL IN ASCII. E-MAILS CONTAIN ALL SORTS OF CHARACTERS WHICH MEANS THINGS AND THEY ARE ALL IN ASCII.

AND IN SOME LANGUAGES AND SOME CULTURES WE TEND TO WRITE THINGS LEFT TO RIGHT AND IN OTHER LANGUAGES AND CULTURES, WE TEND TO WRITE THINGS RIGHT TO LEFT. IN ENGLISH, TO TALK ABOUT MY HAVING AN E-MAIL ADDRESS WHICH IS WRITTEN IN TERMS OF A USER NAME OR MAILBOX AT A MACHINE, OR AT A DOMAIN NAME, MAKES PERFECTLY GOOD SENSE.

IN AN ENVIRONMENT IN WHICH THE CONVENTIONS ARE SUCH THAT WE WOULD ALWAYS TALK ABOUT THE SYSTEM FIRST AND THEN THE FAMILY NAME AND THEN THE FIRST NAME, IT MAKES NO SENSE AT ALL, AND IN ADDITION THAT, "@" SIGN IS NOT ONLY AN ASCII CHARACTER, BUT IT'S AN ASCII CHARACTER WHICH WE DON'T LEARN IN SCHOOL BECAUSE IT'S A VERY ODD CHARACTER.

WE HAVE A DEPLOYMENT PROBLEM. THE INTERNET, FOR BETTER OR WORSE, IS NOT JUST THE WORLDWIDE WEB, AND THE HTTP AND HTTPS PROTOCOLS, WHICH ARE TYPICALLY USED WITH IT.

THE GOOD NEWS IS FROM CONTENT EXCHANGE AND CONTENT USABILITY STANDPOINT, THE TWO SHARE A GREAT DEAL OF DESCRIPTIVE STRUCTURE AND A GREAT DEAL OF MECHANISM. THE BAD NEWS IS YOU CAN'T CHANGE ONE WITHOUT CHANGING THE OTHER AND THE BAD NEWS IS YOU CAN'T CHANGE WITHOUT THE OTHER.

WHEN WE FIX THE CONTENT INTERNATIONALIZATION PROBLEM FOR E-MAIL A DOZEN YEARS AGO, WE FIXED THE CONTENT INTERNATIONALIZATION PROBLEM FOR THE WEB. LOTS OF LITTLE DETAILS HERE AND THERE BUT THE GENERAL OVERALL PROBLEM GOT FIXED.

BUT ONE OF THE EXPERIENCES WE'VE HAD ON THE INTERNET IS IF A NEW APPLICATION COMES ALONG THAT IS COMPLETELY NEW AND DIFFERENT AND FILLS A GAP THAT NOBODY REALIZES THEY HAD BEFORE, BRIGHT NEW IDEA, VERY EXCITING, IT DEPLOYS VERY, VERY QUICKLY. IN THE LAST DECADE AND A HALF WE'VE SEEN IT WITH THE WEB, WITH FILE SHARING APPLICATIONS, WITH A NUMBER OF OTHER THINGS. BUT WHEN SOMETHING COMES ALONG WHICH IS INTENDED TO REPLACE A FACILITY OR A PROTOCOL WHICH IS REASONABLY WIDELY DEPLOYED AND WORKS REASONABLY WELL, OR ALMOST WORKS REASONABLY WELL, OR IS BARELY TOLERABLE, WE FIND IT ALMOST IMPOSSIBLE TO GET RID OF.

WE'VE HAD A NUMBER OF EFFORTS OVER THE YEAR WHICH HAVE COME FORWARD AND SAID, YOU KNOW, IF WE REDESIGNED E-MAIL FROM THE BOTTOM-UP WE COULD MAKE THIS A BETTER WORLD. NONE OF THEM HAVE GONE ANYWHERE, AND IF YOU WANT A PREDICTION, THE LATEST ROUND ISN'T GOING ANYWHERE EITHER. AND THE REASON IS IN ORDER TO MAKE THOSE KIND OF SWITCHOVERS, YOU END UP, IF YOU WANT TO COMMUNICATE, HAVING TO MAINTAIN BOTH ENVIRONMENTS FOREVER, AND YOU END UP HAVING TO TRANSLATE BETWEEN THEM.

IF THE NEW ONE OFFERS A GREAT DEAL OF NEW FUNCTIONALITY AND NEW CAPABILITY, THE TRANSLATIONS BETWEEN THEM WON'T WORK PROPERLY. THEY WILL LOSE INFORMATION PROBABLY IN BOTH DIRECTIONS. IF IT DOESN'T OFFER A GREAT DEAL OF NEW CAPABILITY AND FUNCTIONALITY, PROBABLY NO ONE WILL BOTHER WITH IT, BECAUSE OF THE PAIN AND SUFFERING.

WE DEPLOYED THESE CONTENT CHANGES AND THE CAPABILITIES OF HANDLING INTERNATIONALIZED CONTENT AND MULTIMEDIA CONTENT WITHIN THE E-MAIL ENVIRONMENT, BY SPENDING A GREAT DEAL OF TIME FIGURING OUT HOW TO INSTALL WITHOUT MESSING UP ANYBODY'S EXISTING E-MAIL ENVIRONMENT. IT DIDN'T CAUSE THINGS TO BREAK. MAY HAVE CAUSED THINGS TO LOOK VERY UGLY BUT NOT TO BREAK.

I'M GOING TO TALK ABOUT CONFUSION AND FRAUD AS WE LOOK AT MULTIPLE CHARACTER SETS BUT IT'S IMPORTANT TO REMEMBER THAT THESE ARE NOT CAUSED BY INTERNATIONALIZATION. WE HAVE MOST OF THE PROBLEMS WITH ASCII OR WITH THE COMBINATION OF WEAK SOFTWARE AND BAD USER HABITS.

I DON'T KNOW HOW TO CURE BAD USER HABITS AND WE SEEM INCAPABLE OF CURING THE BAD SOFTWARE PROBLEM. BUT IT MEANS IN LOOKING AT INTERNATIONALIZATION THAT DO NO HARM MAY BECOME ANOTHER IMPORTANT PRINCIPLE.

SOME OF MY SECURITY COLLEAGUES SAY THAT RUNNING CERTAIN SOFTWARE WHICH IS VERY PREVALENT IN THE NETWORK TODAY IS LIKE SUPPLYING GUNS AND BULLETS TO CRIMINALS AND THEN EXPECTING THEM TO NOT SHOOT YOU.

THIS IS UNFORTUNATELY HARD TO READ, THERE'S TOO MUCH INFORMATION ON THE SCREEN BUT MANY OF YOU HAVE SEEN THIS OR VARIATIONS OF IT.

YOU GET AN E-MAIL MESSAGE. IT SEEMS TO CONTAIN A URL WHICH POINTS TO SOMETHING THAT YOU NEED TO COMMUNICATE WITH, AND IT SAYS YOU NEED TO CLICK ON THIS URL AND UPDATE YOUR DATA. IN THIS PARTICULAR CASE, WHAT THE FISHER HAS DONE IS TO RIGHT A LINK INTO THE MESSAGE, OR WHAT LOOKS TO THE USER LIKE A LINK, AND PUT SOMETHING UNDERNEATH IT WHICH IS COMPLETELY DIFFERENT. IF THE USER CAN'T ACCESS THE UNDERLYING LINK AND IS A LITTLE BIT STUPID OR CARELESS, THIS TURNS INTO A BIG IDENTITY THEFT PROBLEM.

WE KNOW HOW TO DESIGN SOFTWARE WHICH CAN CHECK THE LINK THAT'S UNDERNEATH WITH THE LINK ON TOP AND HAVE WARNINGS BUT THAT SOFTWARE IS NOT WIDELY DEPLOYED. GUNS AND BULLETS IN THE HANDS OF CRIMINALS.

WHAT DOES THIS HAVE TO DO WITH IDNS? THAT PARTICULAR PROBLEM IS EASY TO DETECT BY PEOPLE WHO ARE CAREFUL OR CAREFUL SOFTWARE, OR A COMBINATION OF THEM BUT NOW LET'S LOOK AT THIS PARTICULAR STRING. WE LOOK AT SOMETHING WHICH, IF WE ASSUME THIS IS ALL ASCII, LOOKS LIKE HTTP COLON SLASH SLASH ABH.COM. AND IF WE WERE TO LOOK AT THE UNDERLYING LINK, THE UNDERLYING LINK WOULD LOOK PRESUMABLY JUST LIKE THAT.

BUT THIS MAY NOT BE THAT AT ALL. IT MAY BE IN GREEK. AND IF WE HAD WRITTEN IT IN LOWER CASE RATHER THAN UPPER CASE WE WOULD HAVE UNDERSTOOD IMMEDIATELY THAT IT WASN'T WHAT WE THOUGHT IT WAS. THIS BECOMES VERY HARD TO CHECK MECHANICALLY. SO AS WE DEPLOY MORE INTERNATIONALIZATION, PEOPLE ARE GOING TO HAVE TO GET MORE CAREFUL OR WE'RE GOING TO UNCOVER MORE RISKS.

AS I SAY, WE HAVE THIS WITH ASCII. THIS IS NOT A NEW PROBLEM. WITH ASCII, WITH MOST FONTS, COMPUTER HAS NO PROBLEMS. WITH MOST FONTS, SHOWING THINGS ON SCREENS, WE CAN'T TELL THE DIFFERENCE BETWEEN A LOWER CASE L AND A ONE. AND IN SOME CASES WE CAN'T TELL THE DIFFERENCE BETWEEN A ZERO AND AN UPPER CASE L.

WHEN WE START MIXING INTO THIS PUZZLE DIFFERENT SCRIPTS, AS WE SAW WITH THE GREEK EXAMPLE, ALMOST EVERY CONTEMPORARY ALPHABETIC SCRIPT IN USE IN THE WORLD TODAY HAS A COMMON ORIGIN. AND BECAUSE THOSE COMMON ORIGINS EXIST, CHARACTER SIMILARITIES ARE INEVITABLE.

WHEN I WAS IN BANGKOK SEVERAL MONTHS AGO I SAW A BIG SIGN ON A BILLBOARD. IT HAD THREE CHARACTERS WHICH LOOKED JUST ABOUT LIKE THE ONES ON THE SCREEN. AND RED WHITE AND BLUE STRIPES BEHIND IT. AND THERE ARE CERTAINLY FONTS FOR WRITING ASCII AND ROMAN CHARACTERS WHICH LOOK VERY MUCH LIKE THOSE TO THE POINT THAT IF I DIDN'T KNOW BETTER, I WOULD HAVE LOOKED AT THAT AND SAID, "AH, IT SAYS U.S.A.." WELL, I DON'T HAVE ANY IDEA WHAT IT SAID BUT U.S.A. IS NOT WHAT IT IS.

AND WE RUN INTO THESE PROBLEMS IN EVERY SINGLE LANGUAGE.

(INAUDIBLE) POINTED OUT TO ME SOME WEEKS AGO THAT THERE'S THIS WONDERFUL STRING WHICH IF WE READ IT IN ENGLISH LOOKS LIKE PECTOPAN, BUT IF THEY'RE IN CYRILLIC WITH MINOR FONT VARIATIONS IT'S SOMETHING COMPLETELY DIFFERENT. IT'S NOT PRONOUNCE THE SAME WAY, DOESN'T HAVE THE SAME MEANING, WHATEVER THE MEANING OF PECTOPAN MIGHT BE.

AND THEN WE HAVE SOME PROBLEMS WITH CHINESE CHARACTERS. THEY'RE NOT UNIQUE TO CHINESE CHARACTERS. WE'VE GOT MOST OF THE PROBLEMS I'M GOING TO TALK ABOUT WITH CHINESE CHARACTERS WITH ALMOST EVERY OTHER CHARACTER SET IN THE WORLD. ALMOST EVERY OTHER SCRIPT. OVER TIME, SCRIPTS EVOLVE. SOMETIMES CHANGES GET MADE IN AN EVOLUTIONARY FASHION BECAUSE PEOPLE GET LAZY. SOMETIMES CHANGES GET MADE BY GOVERNMENTS AND COMMITTEES, BUT THEY EVOLVE.

THE THING WHICH IS DIFFERENT ABOUT CHINESE IS THAT WITHIN THE LIFETIMES OF MANY OF US, WE HAVE SEEN A MAJOR LANGUAGE REFORM, OF WRITING SYSTEMS, BUT WE'VE SEEN MAJOR -- NOT ONLY -- NOT QUITE SO MAJOR WRITING REFORMS IN AN EVEN SHORTER PERIOD OF TIME WITH GERMAN. THIS IS NOT AS USUAL A SITUATION AS WITH CHINESE BUT THE TWO PARTICULAR SITUATION WITH CHINESE IS WE HAVE TRADITIONAL CHARACTERS IN USE IN SOME PLACES AND TRADITIONAL CHARACTERS IN OTHERS, AND WE HAVE LANGUAGES IN USE THAT DON'T USE THE SIMPLIFIED FORM. AND I WILL COME BACK TO THAT. AND JAMES WILL COME BACK TO THAT IN EVEN MORE DETAIL.

TO CONSIDERABLE EXTENT WHEN THE DNS WAS DESIGNED, IT WAS DESIGNED AS A NETWORK FACING IDENTIFIER FOR COMPUTER SYSTEMS AND THE PEOPLE WHO USED THEM AT THE LOW LEVELS TALK ABOUT HOSTS. TALK ABOUT THE NETWORK.

SO WE HAVE TENSION BETWEEN A NETWORK FACING IDENTIFIER AND THE USER FACING NAME OF A PRODUCT, COMPANY OR ORGANIZATION. CONSTRAINTS ARE DIFFERENT. IF SOMEBODY SPENT A LOT OF TIME INVENT AGO TRADEMARK NAME THEY WANT TO SPELL IT AND LOOK AT IT THE WAY THEY WRITE IT OR THE WAY THEY SEE IT. COMPUTERS DON'T CARE VERY MUCH.

AND THERE ARE SOME CONSTRAINTS IN THE SOLUTIONS BECAUSE OF THE WAY THE DNS WORKS. WE CAN'T TAG STUFF. IT'S HARD TO REPRESENT PICTURES. IT'S HARD TO DECIDE THAT SOME CHARACTER STRING IS GOING TO GET PRESENTED EXACTLY THE WAY YOU WANT IT TO GET PRESENTED. THERE ARE SOME LIMITATIONS ON LENGTH. THERE ARE SOME VERY NASTY LIMITATIONS ON UNIQUENESS.

DNS DOES NOT SUPPORT A SEARCHING OR MATCHING MECHANISM IN WHICH YOU CAN SAY WELL, IT'S ALMOST THE SAME AS THAT ONE SO RETURN IT. OR THESE TWO THINGS ARE SIMILAR SO LET'S RETURN BOTH OF THEM. PEOPLE TRY THAT; THEY MESS UP OTHER PEOPLE'S APPLICATIONS.

AS A CONSEQUENCE OF THOSE THINGS AND THE FACT THAT CYBERSQUATTERS AND OTHER CRIMINALS ARE OFTEN SMARTER THAN WE ARE OR STAY AHEAD OF US, THERE IS A LOT OF POTENTIAL FOR CONFUSION OR FRAUD. SOME OF IT ACCIDENTAL, SOME OF IT DELIBERATE.

BUT DNS LABELS THEMSELVES ARE TRADITIONALLY JUST ARBITRARY STRINGS OF WHATEVER CHARACTERS ARE PERMITTED. AND FROM THAT STANDPOINT, WE STARTED OUT WITH THE HOST NAME RULES WITH A VERY SMALL SET OF PERMITTED CHARACTERS, AND THE ONLY THING THAT IDNS DO IS TO EXPAND THE LIST OF PERMITTED CHARACTERS. ANYTHING ELSE IS APPLICATION SOFTWARE AND CONVENTIONS.

SO WHILE THE REQUIREMENT FOR NON-ASCII STRINGS IS VERY CLEAR, AND I HOPE NO ONE QUESTIONS THAT ANYMORE, CAUTION IS IN ORDER BECAUSE OF THE TRAPS AND THE RISKS, AND CAUTION IS ESPECIALLY IMPORTANT BECAUSE THIS IS ONE OF THOSE AREAS IN WHICH, IF WE ARE TOO PERMISSIVE AND TOO FLEXIBLE AND IT GETS US INTO TROUBLE, IT'S VERY, VERY HARD TO CHANGE OUR MINDS AND GO BACK. WE'LL BE LIVING WITH THE TROUBLE FOREVER.

AND THERE ARE PLACES WHERE THE DNS CAN'T HELP.

INTERNATIONALIZATION IS A PROBLEM ABOUT LANGUAGES AND USERS, AND USAGE. IT IS NOT A PROBLEM ABOUT SCRIPTS. AND IT IS ESPECIALLY NOT A PROBLEM ABOUT INDIVIDUAL CHARACTERS.

ONE NEEDS LOCAL MATCHING RULES WHICH MAKE SENSE.

THE STANDARD FOR IDNS DOES A CONSIDERABLE AMOUNT OF MAPPING BETWEEN CHARACTERS WHICH ARE SIMILAR OR RELATED. PROBABLY THE MAPPING DECISIONS WHICH WERE MADE WERE THE VERY BEST DECISIONS WHICH COULD BE MADE GIVEN THE TECHNOLOGY AND THE LIMITATIONS AT THE TIME. ONE COULD HAVE MADE OTHER DECISIONS. THERE'S NO EVIDENCE THEY WOULD HAVE BEEN BETTER. THERE'S SOME EVIDENCE THAT MANY OF THEM WOULD HAVE BEEN WORSE.

IF YOU SAY THAT AROUND THE IETF, THEY WON'T BELIEVE YOU'RE QUOTING ME. SO WE NEED TO A CONSIDERABLE EXTENT LOCAL MATCHING RULES AND THE DNS DOESN'T DO WELL WITH LOCAL. AS SOON AS YOU DECIDE SOMETHING IS GOING TO BE RESOLVED ONLY BY YOUR SERVERS AND OUR CACHES IN YOUR WAY, YOU'RE CREATING A GLOBAL INTEROPERABILITY PROBLEM. SO WE NEED, IN ORDER FOR PEOPLE TO GET ALONG WITH THIS, WHAT WE HAVE NEEDED FOR CENTURIES. THE ABILITY TO SEARCH AND THE ABILITY TO RESOLVE AMBIGUITY. SEARCHING AND RESOLVING AMBIGUITY SAYS THAT NEAR MATCHES COME BACK TO THE USER OR SOME OTHER SOFTWARE AND SOMETHING GETS DONE WITH THEM. DNS IS ONLY GOOD AT RETURNING TWO ANSWERS. THE ONE YOU ASKED FOR AND NO.

AND THE MORE WE GO DOWN THIS PATH, IF WE WANT GOOD LOCALIZATION AT THE END AS WELL AS GLOBAL INTEROPERABILITY, WE START NEEDING TO WORRY ABOUT ATTRIBUTE STRUCTURE LIKE LANGUAGE, LOCATION, ENTRY, AND BUSINESS TYPES AND MAYBE PURPOSE AND SCRIPT AND INTENT AND CONTEXT.

AND WE CAN'T DO THOSE VERY WELL IN DNS. OUR ONLY TOOL FOR DOING THOSE THINGS IN DNS IS STRUCTURING. BUT WE CAN'T STRUCTURE WELL WITHIN A LABEL. THEY'RE NOT LONG ENOUGH AND THEY DON'T HAVE CLEAR STRUCTURE. SO WE DO IT WITH DNS HIERARCHY, BUT THAT INTRODUCES SOME OTHER ISSUES.

SO AS I SAID, IT'S YES OR NO OR NO. NO HINTS, NO ALTERNATIVES.

IF WE COMPLETELY LOCALIZE AND WE IGNORE THE GLOBAL ISSUES, WE TEND TO FRAGMENT THE NETWORK. IF I DEVELOP A REALLY GOOD SYSTEM FOR ME TO COMMUNICATE WITH THE FOUR OTHER PEOPLE WHO SPEAK KLINGON OR SEVERAL HUNDRED OTHER PEOPLE WHO SPEAK KLINGON AND THAT'S ONLY GOOD FOR COMMUNICATION AMONG THOSE SPEAKERS, THEN DON'T EXPECT TO COMMUNICATE WITH THEM OR EXPECT THEM TO COMMUNICATE WITH YOU BUT THEY MAY BE VERY HAPPY BECAUSE THEY MAY NOT CARE UNTIL ISSUES LIKE COMMERCE AND TRADE AND NEWS AND LETTERS INTRUDE.

THE ABILITY TO TRANSLATE OR TRANSLITERATE CHARACTERS BETWEEN SCRIPTS OR BETWEEN CODINGS, OR WITHIN AN ENVIRONMENT, MAY BE VERY IMPORTANT OR THEY MAY NOT BE IMPORTANT AT ALL DEPENDING UPON THE LANGUAGE AND SCRIPT AND USER AND THE APPLICATION. I MENTION SIMPLIFIED AND TRADITIONAL CHINESE. VERY IMPORTANT CASE.

JAPANESE AS MOST OF YOU KNOW COULD BE WRITTEN IN TWO WAYS. EITHER PHONETICALLY OR IN CHARACTERS DERIVED FROM CHINESE. SOME OF US HAVE HAD A VERY INTERESTING EXPERIENCE ABOUT JAPANESE. WE ASKED A JAPANESE COMPUTER SCIENTIST WHETHER THERE'S A NEED TO WORK A PARTICULAR WORD WRITTEN PHONETICALLY OR IN KANJI, AND THE COMPUTER SCIENTISTS ALMOST ALWAYS SAY NO, NO ONE WOULD NEED TO MAKE THAT MATCH. AND THEN WE ASKED SOMEBODY ON THE STREET IN JAPAN WHO IS NOT A HEAVY USER OF THE INTERNET AND HASN'T BEEN IMMERSED IN THESE ISSUES, AND HE SAYS OF COURSE, ANY TEN-YEAR-OLD CAN DO THAT. THE DIFFERENCE IS IN CULTURE, CULTURE INTERFACE AND AGENTS, BUT THE DNS CANNOT SOLVE THAT PROBLEM, AND ANY SOLUTION WE COME UP WITH CANT THE DNS AND THE INTERNET IN THE PROCESS. AT LEAST I HOPE THEY CAN'T.

THERE ARE SEVERAL LANGUAGES, THE MOST IMPORTANT ONES BEING THOSE WHO USE HEBREW AND ARABIC SCRIPT, IN WHICH VOWELS AND ACCENT MARKS AND OTHER SORTS OF THINGS LIKE THAT ARE OPTIONAL. THE LANGUAGE CAN BE WRITTEN EITHER WITH OR WITHOUT THEM. IS IT IMPORTANT TO BE ABLE TO INCLUDE THEM AT ALL?

I DON'T KNOW. LOCAL ISSUE. BETTER SOMETHING CAN BE RESOLVED LOCALLY OR WITHIN THE CONTENT OF THAT SCRIPT. IF YOU PERMIT THEM AND PERMIT HAVING THEM NOT THERE, THEN THEY HAVE TO MATCH.

IT'S AN INTERESTING PROBLEM. THE ANSWER IS SOMETIMES, OR BETTER DECIDE LOCALLY. AND IF THERE ARE TWO DIFFERENT WORDS WHICH ARE THE SAME WITHOUT THE VOWELS BUT DIFFERENT WITH THE VOWELS, THEN THERE'S AN INTERESTING QUESTION AS TO WHETHER THOSE TWO WORDS MATCH UNDER THE RULE LET'S JUST THROW THE VOWELS AWAY.

HAS TO BE THE CAPABILITY TO LOCALIZE PROPERLY BY MAKING THOSE DECISIONS LOCALLY. IT'S NOT A DNS PROBLEM F WE TRY TO GET INVOLVED, WE WILL MESS UP THE DNS AND WE WILL GET IT WRONG.

AND THERE'S SOME INTERESTING TYPOGRAPHIC CONVENTIONS.

THERE'S A FUNNY SITUATION IN THAT WE STARTED THIS STUFF WITH LATIN SCRIPTS AND IT LOOKS FROM THE OUTSIDE AS IF THE LATIN SCRIPTS OUGHT TO BE THE EASY ONE BECAUSE ASCII IS PART OF THEM. THOSE SCRIPTS, BASED ON ROMAN WRITING SYSTEMS ARE USED TO WRITE MORE DIFFERENT LANGUAGES IN THE WORLD THAN ANYTHING ELSE, AND THEY'RE USED IN DIFFERENT WAYS.

AND BECAUSE PEOPLE DISCOVERED THEY NEED TODAY PUT THINGS ON COMPUTERS LONG BEFORE WE HAD THESE COMPLICATED MULTI-LINGUAL -- MULTI-SCRIPT CHARACTER SETS, THEY DEVELOPED TYPING CONVENTIONS. INDEED, THEY DEVELOPED THESE TYPING CONVENTIONS WITH TYPEWRITERS. CALLED TYPE.

SO THERE'S A CONVENTION IN GERMAN THAT YOU CAN ALMOST ALWAYS WRITE UMLAUT AS OE, BUT SOME THINGS WRITTEN AS OE DON'T TRANSLATE BACK TO UMLAUT, AND IT'S VERY HARD TO GET IT RIGHT. IN FACT, YOU NEED DICTIONARIES. AS SOON AS YOU NEED DICTIONARIES YOU'VE IMPOSED ENTIRELY NEW RULE ON DNS THAT WE HAVEN'T SEEN BEFORE AND THAT'S WHAT'S GOT TO BE IN THE DNS HAS GOT TO BE WORDS. A B C3 FIVE 55 IS A PERFECTLY FINE DNS LABEL BUT IT'S A LOUSY WORD.

AND A VARIATION OF THE SLIDE, I PUT ON A NOTE BOARD A FEW YEARS AGO, BY NOW YOU SHOULD BE AT LEAST A LITTLE BIT FRIGHTENED IF I'VE BEEN AS ALL SUCCESSFUL. THE QUESTION IS HOW DID WE GET HERE, WHAT DO WE DO ABOUT IT?

ANCIENT HISTORY LESSON.

WHEN THE ARGUMENT STARTED ON THE ARPANET ABOUT HOW HOST NAMED SHOULD BE REPRESENTED, THERE WAS ONLY ONE INTERNATIONAL STANDARD FOR CHARACTER SETS ON COMPUTERS AND IT WASN'T FINISHED YET. IT WAS CALLED ISO 646 AND IT'S STILL WITH US. AND ISO 646 THROUGH MOST OF ITS LIFE WAS DEFINED AS HAVING TWO REPRESENTATIONS, ONE OF WHICH WAS CALLED THE BASIC VERSION; THE OTHER ONE OF WHICH WAS CALLED THE INTERNATIONAL REFERENCE VERSION. THE INTERNATIONAL REFERENCE VERSION IS ALSO CALLED ASCII. SAME CHARACTERS, SAME CODING, SAME RULES. BUT THE BASIC VERSION RESERVES A DOZEN, HALF A DOZEN CHARACTER POSITIONS FOR NATIONAL USE.

SO IF YOU'RE TRYING TO DESIGN SOMETHING FOR INTERNATIONAL ENVIRONMENT YOU DON'T DARE USE THOSE INTERNATIONAL USE POSITIONS BECAUSE IT WILL SHOW UP AS DIFFERENT CHARACTERS SPENDING ON WHICH COUNTRY YOU'RE IN OR WHICH LANGUAGE YOU THINK YOU'RE SPEAKING. SO THERE'S A LONG SET OF DISCUSSIONS, SOME OF WHICH PREDATES THE ARPANET ABOUT HOW YOU SHOULD REPRESENT THESE NAMES TO BE SAFE. AND ONE OF THE ANSWERS IS CLEARLY NO CHARACTERS IN THE INTERNATIONAL, NATIONAL USE POSITIONS.

ANOTHER RULE IS, BECAUSE PEOPLE ARE GOING TO WRITE THINGS DOWN ON PAPER AND HAND THE PAPER TO OTHER PEOPLE AND WE ALL WRITE FUNNY, THAT YOU DON'T WANT TO PERMIT BOTH HYPHENS AND UNDERSCORES IN DOMAIN NAMES BECAUSE YOU WON'T BE ABLE TO TELL THEM APART.

SO WE CAME BACK 20 YEARS LATER, DECIDED WE NEED TODAY LEARN ABOUT OTHER KINDS OF INTERNATIONALIZATION PROBLEMS, CAME UP WITH WEB AND CAME UP WITH MIME. MIME WAS DEVELOPED FOR E-MAIL. THE WEB USES IT, SO THE WEB PEOPLE WOULD TELL YOU IT WAS DEVELOPED FOR THE WEB AND E-MAIL USES IT. THEY'RE BOTH RIGHT. AND MIME ITSELF IS A SYSTEM FOR IDENTIFYING AND STRUCTURING CONTENT OTHER THAN SIMPLE ASCII TEXT.

WE STARTED THE PROJECT TO WORRY ABOUT MULTIPLE LANGUAGES AND SCRIPTS, IT GOT HIJACKED BY THE MULTIMEDIA PEOPLE. WHAT WE CAME OUT WITH IS RECENTLY SATISFACTORY FOR BOTH. AND IT HAD BETTER BE REASONABLY SATISFACTORY BECAUSE THE ODDS OF GETTING RID OF IT ARE PRETTY LOW.

BUT AS I SAY, WE STARTED WORRYING ABOUT INTERNATIONAL CHARACTERS IN THE ARPANET AND SOME THINGS WHICH FED INTO IT IN THE 1970S. INDEED, SOME OF THOSE DISCUSSIONS I WAS AWARE OF STARTED IN '68. THAT WAS BEFORE THERE WERE ANY PACKETS ON WIRES.

THE CHARACTER SET STANDARDS WEREN'T READY. WE COULDN'T DO ANYTHING OTHER THAN KEEP OURSELVES OUT OF SERIOUS TROUBLE. THERE WERE NO INTERNATIONALLY ACCEPTED CHARACTER STANDARDS FOR ANYTHING BUT ROMAN CHARACTERS AND THERE WASN'T MUCH FOR ROMAN.

WEB FOLKS RECOGNIZE THE REQUIREMENT FOR INTERNATIONALIZATION EARLY. BEING IN AN ENVIRONMENT THAT SAT ON A FRENCH/SWISS BORDER PROBABLY HELPED. BUT THE DETAILS WEREN'T WORKED OUT UNTIL THE MID '90S AND EVERYTHING WAS DONE BY TAGGING. MIME IS A TAGGING SYSTEM. I HAVE A CYNICAL COLLEAGUE WHO CLAIMS MIME IS ABOUT DOCUMENTING INTEROPERABILITY. SOMETHING IS LABELED, IT ARRIVES ON MY DESKTOP, AND I KNOW EXACTLY WHY I CAN'T READ IT, WHICH IS BETTER THAN NOT KNOWING EXACTLY WHY I CAN'T READ IT, ALTHOUGH BEING ABLE TO READ IT IS BETTER.

WHAT WE KEEP SAYING ABOUT MIME AND OTHER THINGS IS THE FEWER OF THOSE CONTENT TYPES WE HAVE AROUND, THE MORE INTEROPERABILITY WE HAVE. WELL, WE'RE LOSING THAT RACE.

THE APPLICATIONS PROTOCOLS THEMSELVES ON THE INTERNET ARE BY AND LARGE DEFINED IN TERMS OF ASCII OR AT LEAST SEVEN-BIT CHARACTERS. IT'S NOT AN ACCIDENT. IT'S NOT IGNORANCE. IT'S NOT AMERICAN ARROGANCE. ALTHOUGH THERE IS SOME OF THAT IN ALL OF THIS, NO DOUBT.

WE NEED TO REMEMBER THAT THE ITU RECOMMENDATIONS WHERE THEY'RE TALKING ABOUT PROTOCOL ELEMENTS, EITHER SEND THOSE PROTOCOL ELEMENTS AROUND IN OBSCURE NUMBERED CODES OR THEY SEND THEM AROUND IN ALPHABETS WHICH ARE EVEN MORE RESTRICTED THAN ASCII HOST NAMES.

IF WE DECIDE OUR ONLY SOLUTION IS TO WAIT FOR ALL THE APPLICATIONS TO BE UPGRADED, IT'S GOING TO BE A LONG WAIT, AND IT'S GOING TO INVOLVE SOME UNPREDICTABILITY BECAUSE AT NO TIME WILL THE SENDER KNOW WHAT CAPABILITIES THE RECEIVER HAS. WE KNOW HOW TO FIX THAT PROBLEM. THE WAY WE FIX THAT PROBLEM IS TO ANNOUNCE THAT ON AUGUST 1ST WE WILL SHUT DOWN THE INTERNET AND WHEN WE BRING IT BACK UP EVERYONE ELSE WILL BE USING THE NEW SYSTEM.

IF ANYBODY BELIEVES WE COULD DO THAT TODAY, I HAVE A BRIDGE I'D LIKE TO SELL YOU, TO USE AN OLD AMERICAN SAYING.

SO WE HAVE TO TALK ABOUT SOME ALTERNATIVES TO UPGRADING AND ESPECIALLY REPLACING APPLICATIONS. IN THE INTERNATIONALIZATION SPACE WE HEAR A LOT ABOUT PLUG-INS AND PATCHES. WE WILL FIX YOUR BROWSER SO THAT IT WORKS DIFFERENTLY AND DOES INTERNATIONALIZATION WELL.

OUR EXPERIENCE SO FAR IS IT DOESN'T WORK VERY WELL. YOU DON'T GET A CONSISTENT USER EXPERIENCE FROM ONE USER TO THE NEXT.

I DO SOMETHING ON MY SCREEN, I CALL UP SOMEBODY AND TELL THEM WHAT TO TYPE AND IT DOESN'T WORK IN THEIR ENVIRONMENT. WE GETS DIFFERENCES FROM ONE USER TO THE NEXT, BETWEEN VERSION 3 OF MY BROWSER AND VERSION 4 OF MY BROWSER OR BETWEEN SERVICE PACK ONE AND SERVICE PACK TWO.

THE IDNA STANDARD, AS JAMES WILL TELL YOU AT LENGTH, ENDS UP WITH SOMETHING CALLED PUNYCODE IN THE DNS. PUNYCODE IS UGLY. NO USER WAS REALLY EXPECTED TO LOOK AT IT. AND WE WILL SUCCEED WHEN USERS DON'T HAVE TO.

BUT THERE ARE SITUATIONS, ESPECIALLY TODAY, WHEN DEALING WITH AND LOOKING AT AND WRITING PUNYCODE IS INEVITABLE AND PROBABLY SOME OF THOSE CONDITIONS WILL BE WITH US FOR QUITE A WHILE.

IF I GO TO A USER WHO HAS BEEN SURVIVING IN THE NETWORK FOR THE LAST TEN YEARS OR 20, AND THAT USER HAS BEEN DEALING WITH SOME UGLY TRANSLITERATION OF HER NAME INTO A DIFFERENT CHARACTER SET, AND HATING IT EVERY TIME IT'S WRITTEN, FOR US TO COME BACK TO THAT USER TODAY AND SAY, " BOY, ARE WE GOING TO IMPROVE YOUR LIFE, WE'RE GOING TO TAKE THAT UGLY TRANSLITERATION WHICH YOU HATE BUT WHICH YOU CAN MANAGE TO REMEMBER, AND SO CAN YOUR FRIENDS AND COLLEAGUES, AND REPLACE IT WITH A STRING THAT STARTS WITH XN AND TWO HYPHENS AND IS COMPLETELY UNINTELLIGIBLE, AND I WOULD LIKE TO YOU BELIEVE THAT'S AN IMPROVEMENT."

MOST USERS ARE GOING TO LOOK AT US AS IF WE ARE CRAZY.

IMPORTANT TRANSITION PROCESS, BUT WE NEED TO UNDERSTAND WHAT WE'RE GETTING INTO.

AT THAT POINT, I AM GOING TO STOP AND CATCH MY BREATH AND TURN THIS OVER TO PROFESSOR TIN WEE TAN, IF HE'S HERE.

>>SHARIL TARMIZI: YES.

>>JOHN KLENSIN: WHEW. I DIDN'T SEE YOU COME IN.

WOULD YOU LIKE A MICROPHONE?

BY MEANS OF VERY LIMITED INTRODUCTION AS HE'S WALKING IN AND GETTING HIS COMPUTER PLUGGED IN, PROFESSOR TAN IS ARGUABLY THE REASON WHY WE'RE HERE TODAY.

WHILE MANY OF THESE INTERNATIONALIZATION ISSUES GO BACK A VERY LONG TIME AND WE HAVE BEEN WORRIED ABOUT THEM FOR A VERY LONG TIME, HE WAS THE PERSON WHO STOOD UP AND SAID, INTERNATIONALIZING DOMAIN NAMES IS IMPORTANT AND IT'S TIME WE STARTED DOING IT RIGHT NOW.

IS THAT OKAY? AND I WILL LET YOU TAKE OVER.

>>SHARIL TARMIZI: SORRY, JUST AS A REMINDER TO PEOPLE SITTING IN THE HALL. IF YOU NEED POWER, IT'S ON THIS SIDE OF THE ROOM. AND THERE'S CABLE CONNECTIVITY ALSO, WIRED CONNECTIVITY, ON THIS SIDE. ON THAT FAR SIDE, IT'S JUST WIRELESS AND YOU'RE ON BATTERIES. SO IT'S WIRELESS, NO POWER, WIRELESS, NO POWER, WIRELESS, SOME POWER, WIRED AND POWER. THANK YOU. JUST TO LET YOU KNOW SO YOU DON'T HAVE TO GO OUT AND RECHARGE. YOU CAN CROSS OVER.

>>JOHN KLENSIN: FOR THOSE OF YOU WHO WERE WORRIED ABOUT THE OTHER PART OF THE PROBLEM, WE WILL TAKE A BREAK, VERY BRIEF BREAK, AFTER PROFESSOR TAN IS FINISHED. THEN WE'LL COME BACK AND FINISH THE REST OF IT.

>>TIN WEE TAN: THANK YOU VERY MUCH, JOHN. GIVES ME GREAT PLEASURE TO ADDRESS THIS AUDIENCE.

AND MY NAME IS TIN WEE TAN OR TAN TIN WEE, DEPENDING ON WHICH CONVENTION YOU ARE USING. BUT I LIKE TO USE THE CHINESE CONVENTION JUST TO CONFUSE PEOPLE A LITTLE BIT. SO MY NAME HAS BEEN FOUND IN ALL KINDS OF PERMUTATIONS. BUT, ANYWAY, IT WAS MY FAULT.

THE HISTORY OF INTERNATIONALIZED DOMAIN NAMES STARTED WHEN WE SAT AROUND AND DECIDED THAT WE HAD TO DO SOMETHING ABOUT THE MULTILINGUAL ISSUE. IT STARTED IN A BIG WAY IN 1998 WITH THE FIRST WORKING IMPLEMENTATION OF POSSIBLY YOU COULD SAY IT WAS A PRIMITIVE ASCII-COMPATIBLE ENCODING SYSTEM WHICH WAS THEN CALLED UTF 5, WHICH WE PLAYED AROUND WITH AT THE INTERNET R & D UNIT OF THE INTERNATIONAL UNIVERSITY OF SINGAPORE AROUND ABOUT MARCH OR SO OF 1998.

AND THE FIRST VERSIONS ACTUALLY WORKED ON OUR SYSTEMS, AND WE WERE REALLY EXCITED ABOUT IT. AND BECAUSE AT THAT TIME I WAS THE CHAIRMAN OF THE ASIA-PACIFIC NETWORKING GROUP, APNG, THE OLDEST ASIA-PACIFIC INTERNET ORGANIZATION, WE DECIDED TO EXPAND THAT INTO A COMMISSION TO EXPLORE THE WIDER RAMIFICATIONS OF THE IDN ISSUE. THIS LED TO A FORMATION OF AN ASIA-PACIFIC TEST BED IN WHICH WE TESTED IN DIFFERENT CONTEXTS, IN DIFFERENT COUNTRIES THROUGHOUT THE ASIAN REGION, BECAUSE THAT WAS WHERE MOST OF THE INTEREST OCCURRED.

SO TOWARDS THE SECOND HALF OF 1998, WE WENT AROUND ALL THE DIFFERENT COUNTRIES AND INTRODUCED THE CONCEPT OF IDN TO THEM. AND, OF COURSE, EVERYBODY WAS SAYING IT JUST IS IMPOSSIBLE, IT'S TECHNICALLY NOT FEASIBLE, ET CETERA, ET CETERA. AND WE ROLLED OUT OUR MACHINES AND WE SHOWED THAT IT WORKED. AND PEOPLE WERE REALLY EXCITED ABOUT THIS, BECAUSE IT WAS TALKED ABOUT BEFORE BY PEOPLE LIKE MARTIN DÜRST ABOUT HOW ONE COULD IMPLEMENT MULTILINGUALIZED DOMAIN NAMES OR INTERNATIONALIZED DOMAIN NAMES WHICHEVER YOU LIKE TO CALL IT. AND PEOPLE SAID IT WAS NOT REALLY POSSIBLE. BUT HERE WE WERE SHOWING PEOPLE A DEMONSTRATION OF A WORKING MODEL.

AROUND ABOUT THAT TIME, IF YOU CAN RECALL FAR BACK ENOUGH, IN 1998, AUGUST, IN SINGAPORE, IFWP, THE FORUM FOR -- ON THE WHITE PAPER, WAS GOING AROUND THE WORLD. AND WE DEMONSTRATED THAT IN AUGUST 1998, SINGAPORE.

BUT, OF COURSE, THE EARLY ENTITY -- THE PRECURSOR ENTITY OF MEETINGS -- OF THE ICANN MEETINGS AS WE KNOW OF TODAY WERE FOCUSED ON TRYING TO GET EVERYBODY TOGETHER, AND FOR SURE, MULTILINGUAL DOMAIN NAMES WAS DEFINITELY NOT ON THEIR RADAR SCREEN.

BUT QUIETLY, PEOPLE IN THE ASIAN REGION WERE EXTREMELY INTERESTED, AND THERE WAS AROUND ABOUT THE END OF 1998 AN EXPLOSION OF INTEREST AMONGST PEOPLE WHO ACTUALLY SAW, MOSTLY FOR THE FIRST TIME, HOW MULTILINGUAL DOMAIN NAMES COULD BE IMPLEMENTED.

THE MOTIVATIONS FOR IDN BEFORE 1998 WERE VERY WELL ARTICULATED BY MARTIN DÜRST, WHO WAS WORKING WITH THE W3C, THE WORLDWIDE WEB CONSORTIUM, ABOUT THAT TIME. AND HE WAS TOYING AROUND WITH THIS IDEA. AND HE WROTE A PAPER JUST TO PROVE FRIENDS WRONG, THAT IT COULD ACTUALLY -- HE COULD ACTUALLY PRODUCE A PAPER THAT POINTED TO A WORKABLE SOLUTION FOR SUPPORTING MULTILINGUAL CHARACTERS IN THE DOMAIN NAME. AND THIS WAS THE EARLIEST ACADEMIC PROOF THAT IDN WAS POSSIBLE IN THE DNS, AROUND ABOUT '96, '97.

BUT THE UNDERLYING THAT IDN MOVEMENT, THAT IDN PHENOMENON IN 1998 WAS A PHENOMENOLOGICAL ONE, AND THAT WAS OF THE PENT-UP DEMAND IN A LOT OF COUNTRIES BEING INTRODUCED TO THE WORLDWIDE WEB AND EXCITED BY THE CONCEPT THAT MY LANGUAGE COULD APPEAR ON THE INTERNET, MY WEB PAGES COULD BE RENDERED IN MY LANGUAGE, AND I CAN START PUBLISHING THINGS ON THE WEB IN MY OWN LANGUAGE.

SO THE WIDER PRE-1998 MOTIVATIONS THAT PROVIDED THAT UNDERGROUND -- UNDERLYING GROUNDSWELL WAS MULTILINGUALISM AT ITS CORE. AND HERE TODAY, WE ARE SITTING IN THE WSIS MEETINGS AND SO ON, TALKING ABOUT MULTILINGUALISM AND THE NEED TO SUPPORT THE GLOBAL COMMUNITY, WELL THE ROOTS, THE UNDERLYING CURRENTS, WERE WILL BE STIRRING BEFORE 1998. AND HOW THAT TRANSLATED INTO THIS DESIRE TO PUSH IDN FORWARD.

SO I HOPE IT GIVES YOU A FLAVOR AND UNDERSTANDING SOMEWHAT OF WHY WE HAVE THAT -- THIS IDN MOVEMENT AND TODAY THIS IDN WORKSHOP WHERE SO MANY OF YOU HAVE TURNED OUT TO LISTEN TO A TUTORIAL, COORDINATED BRILLIANTLY BY JOHN KLENSIN.

SO THAT MULTILINGUAL INTERNET WAS ALREADY STARTING AS FAR BACK AS 1995, WHEN, IN END OF '93, WHEN THE MOSAIC WEB BROWSER STARTED, TOWARDS 1994 TO 1995, WHEN, WITHIN OUR RESEARCH GROUP, WE STARTED INTRODUCING TAMIL INTERNET MULTIPLE LANGUAGES ON ONE PAGE, CHINESE SCRIPT E-MAIL, WEB ANTHOLOGY -- MULTILINGUAL CHARACTERS ON THE WORLDWIDE WEB, INCLUDING KEYBOARD INPUT SYSTEMS.

THESE KINDS OF MULTILINGUAL ACTIVITY STRETCHED ACROSS MANY DIFFERENT LANGUAGES.

I THINK I MAY HAVE LOADED THE WRONG VERSION OF THE SLIDE. EXCUSE ME FOR A MOMENT.

OKAY. I WILL CARRY ON WITH THE OLD VERSION.

SO WHAT'S THE PROBLEM? THE MULTILINGUAL CONTENT HAD ALREADY TAKEN PLACE AND SOME OF US MAY REMEMBER IN 1996 ISOC, WHICH IS THE COORGANIZER OF THIS WORKSHOP, TIED UP WITH ALIS BY INTRODUCING MULTILINGUAL CHARACTERS. WHAT ABOUT LABELS? YOU CAN ADDRESS THE CONTENT, BUT HOW ABOUT THE LABELING? BECAUSE IN ORDER TO GET TO CONTENT, YOU HAVE TO FIGURE OUT HOW TO KEY IN THE LABEL, YOU KNOW, TO GET TO WHAT YOU WANT.

SO -- LET ME PUT THIS UP.
SO THESE ARE THE LABELS.

AND PEOPLE STARTED ASKING, CAN THEY BE MULTILINGUAL, TOO. AND WE HAVE SHOWN THAT MULTILINGUAL INTERNET LABELS IS POSSIBLE. OUR SOLUTION WAS TO IMPLEMENT IMMEDIATELY WHATEVER UNICODE WE HAD INTO THE ASCII-COMPATIBLE ENCODING UTF 5 AS THE EARLIEST FORM OF WHAT WE NOW KNOW OF AS THE ACE, WHICH JOHN HAS MENTIONED AS PUNYCODE. AND, OF COURSE, THERE ARE LOTS OF PROBLEMS IN IMPLEMENTING ANY TECHNOLOGY. THERE WILL ALWAYS BE PROBLEMS. IT CANNOT BE PERFECT.

AND WE COULD EITHER TAKE THE PURIST APPROACH AND SAY EVERYTHING MUST BE PERFECT BEFORE WE CAN MOVE TO TAKE THE FIRST STEP FORWARD, OR WE CAN SAY, WELL, LET'S REACH A CERTAIN LEVEL OF THRESHOLD, AND IF WE ARE FAIRLY SATISFIED THAT ABOUT 80% OF THINGS CAN GO FORWARD, THE ADVANTAGES OF HAVING THAT TECHNOLOGY IN IMMEDIATE USE FAR OUT WEIGHS THE 20% OF PROBLEMS THAT WE HAVE TO OVERCOME ALONG THE WAY.

WHY NOT JUST SHOOT FOR THE MOON, YOU KNOW?

AND FIGURE OUT THE PROBLEMS, BECAUSE WE HAVE BRILLIANT ENGINEERS LIKE JOHN KLENSIN OUT THERE WHO WILL FIX THE PLANE AS WE FLY IT, YOU KNOW.

(LAUGHTER.)

>>TIN WEE TAN: SO IDN OF THE DNS MOVED FORWARD ON THAT BASIS. SOLVING THE FINAL BARRIER TO WIDESPREAD ADOPTION OF THE INTERNET IN NON-ENGLISH-SPEAKING COMMUNITIES WAS A SERIOUS CONCERN OF OURS. AND THE TEST BED WAS SPONSORED BY APNG. PEOPLE STARTED CRITICIZING US, SAYING THAT, WELL, YOU KNOW, THERE'S NOT GOING TO BE ANY COMMERCIAL INTEREST HERE. NOBODY WILL BE INTERESTED. AND IT'S -- EVEN IF YOU COULD ARGUE THAT IT WAS POSSIBLY 90% TECHNICALLY VIABLE, WELL, THERE WON'T BE AN ECONOMIC FEASIBILITY HERE. IT WOULDN'T TAKE OFF AT ALL.

SO WE DECIDED TO TACKLE NOT JUST THE TECHNICAL PROBLEMS, CONCURRENTLY, WE WANTED TO PROVE TO THE WORLD THAT WE COULD DO IT COMMERCIALLY AS WELL. AND AT THAT TIME OUR UNIVERSITY WAS EXTREMELY SUPPORTIVE OF SPINNING OFF COUNTRIES, ENTREPRENEURSHIP, AND SO FORTH.

SO WE GOT TO ONE OF OUR UNIVERSITY WHOLLY-OWNED COMPANIES TO HELP OUT HERE AND GIVE US SOME FUNDING TO PUSH THIS FORWARD. SO CONCURRENTLY, WHILE THE RESEARCH WAS GOING ON, WHILE THE TEST BED DEPLOYMENT WAS GOING ON, HERE WE ARE TRYING TO MAKE A BUSINESS OUT OF THAT.

THE RESEARCH CONTINUED IN THE FORM OF RESEARCH GRANTS FROM THE INTERNATIONAL DEVELOPMENT RESEARCH COUNCIL OF CANADA, WHO FUNDED US IDN. BY THE TIME WE GOT THE MONEY, WE HAD ALREADY DONE IT COMPLETELY FOR IPV4, SO WE DECIDED TO LOOK INTO IPV6.

SO THE WEB SITE THERE SHOW YOU SOME OF THE OLD INFORMATION ABOUT WHAT HAPPENED IN THOSE EARLY HEYDAYS.

SO THIS PROJECT NOW -- THOSE THAT COULD BE COMMERCIALLY VIABLE SPUN OFF TO FORM THE COMPANY I-DNS.NET. AND FUNDED BY GENERAL ATLANTIC PARTNERS.

AND SEATED THERE IS JAMES SENG, MY FORMER STUDENT, WAS APPOINTED AS CTO.

AND AROUND ABOUT THAT TIME, WE GOT INTERESTED, BECAUSE PEOPLE TOLD US, WELL, YOU CAN HAVE ALL THESE COMPANIES FORMING UP AND EVERYONE HAD THEIR OWN STANDARDS, YOU'RE GOING TO HAVE PROBLEMS. WE HAD BETTER GET A WORKING GROUP GOING ON IDN.

BY THE ITF, IN ORDER TO ENSURE THAT PROPER STANDARDS WERE ADHERED TO.

AND JAMES, VERY ENTHUSIASTICALLY, LEAPT FORWARD, AND YOU WILL -- THE REST WAS HISTORY. IT TOOK US A LONG TIME, THREE YEARS OR SO, IN ORDER TO GET THERE. BUT I THINK JAMES, BEING THE NEXT SPEAKER, WILL ELABORATE ON THAT.

AND I THINK JOHN HAS ALREADY COVERED VERY MUCH THOSE ISSUES.

THERE WERE DOZENS OF COMPANIES HOPPING ON THE BANDWAGON. AND IN 2000, YOU KNOW, THERE WAS THE PEAK OF THE DOT-COM FEVER. SO EVERYBODY WAS JUMPING IN. SO RESEARCH WAS GOING ON, STANDARDIZATIONS WAS GOING ON, COMMERCIALIZATION WAS GOING ON, LAUNCHES LEFT, RIGHT, AND CENTER. PEOPLE ALL REALLY HYPED UP AND VERY EXCITED ABOUT THIS. BUT OVER THE YEARS, WE HAVE SEEN THAT THROUGH ALL THAT WHOLE LIST OF PARTIES THAT WERE INVOLVED, INCLUDING A LOT OF NICS, A LOT OF COMPANIES WERE OFFERING HALF SOLUTIONS BY 2000.

THE INTEROPERABILITY ISSUE BECAME EXTREMELY URGENT, BECAUSE PEOPLE WERE INTRODUCING A LOT OF DIFFERENT SOLUTIONS THAT WERE INCOMPATIBLE WITH EACH OTHER. SOME OF THE POINTS WERE RIGHTLY POINTED OUT BY JOHN A FEW MINUTES AGO.

SO WE SAID, WELL PEOPLE WERE SAYING, WELL, WE BETTER GET THINGS ORGANIZED BECAUSE, YOU KNOW, PEOPLE WERE GOING RATHER THAN CONVERGENTLY, ON DIVERGENT PATHS. WE DECIDED TO GET AROUND TO THE CONFERENCE IN SEOUL AND FORMED WHAT WE CALLED THE MULTILINGUAL INTERNET NAMES CONSORTIUM, OR MINC, AS AN INTERNATIONAL ORGANIZATION GOING FORWARD.

YOU MIGHT BE ASKING IN RETROSPECT WHY COULDN'T WE GET, WHY COULDN'T WE HAVE GOTTEN ICANN TO DO THIS AND SO ON. BUT YOU MUST UNDERSTAND, BACK IN 2000, THAT THE RADAR SCREEN OF ICANN WAS CHOCKFULL WITH A LOT OF HEAVY-DUTY POLITICAL PROBLEMS. AND DEFINITELY IDN WAS NOT ANYWHERE NEAR THE RADAR SCREEN OF ICANN. IN FACT, I CONSIDER TODAY'S WORKSHOP AN EXTREMELY DRAMATIC, GREAT LEAP FORWARD, IF YOU COULD CALL IT, BECAUSE THIS WAS -- IS WHEN ICANN HAS NOW FINALLY BROUGHT IDN DIRECTLY INTO ITS RADAR SCREEN. IT HAS TAKEN US FROM 1998 UNTIL TODAY TO REACH THIS POSITION.

AND THERE WE WERE, TRYING TO GET EVERYBODY ORGANIZED BY FORMING A CONSORTIUM OF MORE THAN 20 FOUNDING MEMBERS. AND THE FOUNDING PERIOD WAS IN JULY 2000.

WE WANTED TO BE INCLUSIVE IN MINC, BECAUSE WE UNDERSTOOD THAT WE WERE TACKLING ALL THOSE PROBLEMS THAT JOHN MENTIONED PERTAINING TO LANGUAGES, PERTAINING TO SCRIPTS. AND WE HAD TO ADOPT PRINCIPLES THAT WILL HELP US MARRY THE TWO DIVERGENT CONCERNS, ON THE ONE HAND, THE ENGINEERS, AND ON THE OTHER HAND, THE ASPIRATION OF THE MULTILINGUAL MASSES. AND WE MUST NOT BREAK THE EXISTING STRUCTURE AND THE HIERARCHY. AND YET, AT THE SAME TIME, WE HAVE TO SUPPORT AS MANY LANGUAGES AS POSSIBLE, IF NOT ALL, AND SUPPORT AS MANY SCRIPT ENCODINGS AS DESIRED, AS REQUIRED, TO AVOID AMBIGUITY, TO PROVIDE THE DEGREE OF UNIQUENESS AND THE CERTAINTY THAT THE DNS ALREADY EXISTED. AND YET, AT THE SAME TIME, WORK EVERYWHERE FOR EVERYONE.

SO THE ADVICE GIVEN TO US WAS, PLEASE, PLEASE, PLEASE, FOLLOW IETF PROCESS. MINIMIZE DISRUPTION AS FAR AS POSSIBLE. FOLLOW THAT RED FLAG THAT JOHN HAS ALREADY INDICATED THAT 1ST AUGUST MIGHT BE THE DATE. HARMONIZE SOLUTION, ADOPT THE SIMPLEST SOLUTION GOING FORWARD. AND TAKE THE PATH OF LEAST CONSTERNATION OR LEAST ASTONISHMENT, WHICH IS ONE OF THE FAVORITE PHRASES OF JOHN.

SO MINC WAS FORMED IN ORDER TO COORDINATE THE RESEARCH ACTIVITY, TO COORDINATE THE INDUSTRY PLAYERS, TO COORDINATE THE POLITICS, TO COORDINATE THE INTERNATIONAL GROUPS, THE LANGUAGE GROUPS.

ONE OF THE KEY THINGS THAT WE DISCOVERED AS WE WENT ALONG, FIXING THE PLANE AS IT TOOK OFF, WAS, ESSENTIALLY, LEARNING HOW TO HANDLE ALL THE DIFFERENT LANGUAGE GROUPS. AND TOGETHER WITH THE LANGUAGE GROUPS CAME WITH A HUGE BAGGAGE OF CULTURE AND EMOTIONAL FEELING ATTACHED TO THAT CULTURE AND TO THE LANGUAGE.

SO WE TACKLED ONE OF THE BIGGEST -- ONE OF THE BIGGER PROBLEMS WAS THAT SOMEBODY TOLD US PEOPLE WILL USUALLY FIGHT FOR THEIR LANGUAGES, THERE'S ONLY ONE LANGUAGE IN THE WORLD WHERE PEOPLE WILL DIE FOR THEIR LANGUAGE, AND THAT WAS THE TAMIL LANGUAGE. AND THERE WAS THIS TAMIL IN THE WORLD AND THIS SRI LANKA SITUATION. AND SO, OF COURSE, I IN THE CONTEXT OF SINGAPORE, WE HAVE A HUGE POPULATION OF -- A LARGE POPULATION OF TAMIL-SPEAKING INDIANS. SO WE SAID WHY NOT START FROM THERE. AND, OF COURSE, I HAD BEEN INVOLVED IN TAMIL INTERNET BACK IN '95.

SO WE SAID, OKAY, WHY NOT HELP AS A TEST CASE THIS GROUP, A DIASPORIC GROUP OF INDIANS WHO WOULD DIE FOR THEIR LANGUAGE, RIGHT, AND FIGURE OUT HOW WE SOLVE THIS PROBLEM. AND OVER THE YEARS, WE HAVE SEEN THE GROWTH AND FORMATION OF THE INTERNATIONAL FORUM FOR IT IN TAMIL, INFITT.

AND THROUGH THE TEST CASE OF THIS ORGANIZATION, WE HAVE UNDERSTOOD HOW TO FORM LANGUAGE GROUPS THAT INVOLVES A DIVERSE COMMUNITIES LOCATED THROUGHOUT THE WORLD, SPEAKING AND USING THE SAME LANGUAGE AND THE SAME SCRIPT WITH ALL THEIR DIFFERENT ASPIRATIONS BEING DIASPORIC AND HOW WE UNIFY THAT, THESE DIASPORIC ASPIRATIONS WITH THOSE OF THEIR MOTHER LAND IN TAMIL NANDO STATE IN INDIA, HOW TO PROMOTE LEVEL OF AWARENESS AMONGST THESE PEOPLE AND HOW TO TECHNICALLY COORDINATE THEM. AND THOUSAND PROMOTE UNDERSTANDING AND COORDINATION AMONGST THESE DIFFERENT GROUPS WITHIN THE CONTEXT OF ONE SINGLE LANGUAGE, ONE SINGLE SCRIPT, RIGHT, AND WITH MANY GROUPS SCATTERED THROUGHOUT THE WORLD.

WE HAVE DONE THAT ALSO WITH THE ARABIC COMMUNITY, WHICH IS A LITTLE BIT MORE COMPLICATED, BECAUSE NOT ONLY IN THE CASE OF TAMIL, ONE LANGUAGE USES -- ONE LANGUAGE GROUP USES ONE SCRIPT.

BUT IN THE CASE OF ARABIC SCRIPT, THERE ARE MANY DIFFERENT GROUPS THAT ACTUALLY ALSO USE THE ARABIC SCRIPT. SO HOW DO WE SOLVE THAT PROBLEM?

THE OTHER PROBLEM WHICH WE ALSO TRIED TO TACKLE WAS THAT OF THE HAN CHARACTER SETS, WHICH IS BEING USED BY THE CHINESE-SPEAKING PEOPLES, THE JAPANESE-SPEAKING PEOPLE IN THE FORM OF ONE OF THE THREE MAJOR SCRIPTS THAT THEY USE, THAT'S KANJI.

AND, OF COURSE, THE KOREAN PEOPLE, HANJA, IN ADDITION TO THEIR HANGEUL SCRIPT.

SO THREE DIFFERENT LANGUAGES IN THE CONTEXT OF HAN CHARACTERS USING THAT SAME SCRIPT OR DIFFERENT SUBSETS OF THOSE SCRIPTS. AND WITH THE COOPERATION OF THE CHINESE, JAPANESE, AND KOREAN PEOPLE, WE HAVE MANAGED TO FORM THE JET, JOINT ENGINEERING TASK FORCE, WHICH I THINK JAMES WILL BE SPEAKING ABOUT.

AND THEY HAVE UNDERGONE TREMENDOUS DEBATES AND DISCUSSION AND REACHED SOME KIND OF AGREEMENT TO MOVE FORWARD.

SO COMING BACK TO MINC, HERE WE ARE, SPINNING OFF GROUPS THAT COULD UNDERSTAND THE LANGUAGE THAT HAD -- THE LEGITIMACY TO HANDLE THE LANGUAGES THAT BELONGED TO THEM, THAT THEY GREW UP WITH, THEY WERE TAUGHT, HANDED DOWN FROM GENERATION TO GENERATION.

LANGUAGE IS SOMETHING VERY CLOSE TO EACH INDIVIDUAL ETHNIC GROUP, AND IT IS IMPORTANT FOR US TO PROVIDE THAT KIND OF RESPECT FOR THESE LANGUAGE GROUPS, EVEN THOUGH WE MAY KNOW THE BEST ANSWER, WELL, EVERYBODY THINKS THEY HAVE THE BEST ANSWER, BUT WE STILL HAVE TO INCLUDE AN ELEMENT OF RESPECT FOR THESE PEOPLES, BECAUSE THAT'S THE LANGUAGE THEY GREW UP WITH.

AND EVEN IF IT DOESN'T -- IT HALF WORKS, THE TECHNOLOGY HALF WORKS, WELL, IT'S THEIRS; RIGHT?

SO THAT'S WHY I THINK THAT THROUGH THIS EXPERIENCE THAT WE HAD OF GOING THROUGH THE PROCESS OF LOOKING AT THE TECHNOLOGY, LOOKING AT THE COMMERCIAL ASPECTS, LOOKING AT THE INTERNATIONAL COLLABORATION AND COOPERATION AMONGST DIFFERENT LANGUAGE GROUPS, THE PROCESS OF HELPING THESE LANGUAGE GROUPS SPIN OFF, CREATE THEIR OWN ORGANIZATION, HELPING THEM MOVE FORWARD, AND TRYING TO HELP THEM TO TALK TO EACH OTHER AND COOPERATE WITH EACH OTHER, HAS CERTAINLY HELPED US OPEN OUR EYES TO THE CHALLENGES WHICH HAVE BEEN MENTIONED BY JOHN EARLIER ON.

THERE WERE, OF COURSE, MANY FOUNDING MEMBERS. AND WE ARE INDEED GRATEFUL TO THESE FOLKS, BECAUSE THESE WERE THE BRAVE FOLKS THAT LEAPT FORWARD INTO IDN, THE IDN MOVEMENT WITH US IN THOSE EARLY DAYS.

SO COMING BACK TO THE BASIC QUESTION. WHY DO WE REALLY NEED MULTILINGUAL NAMES? LABELS?

WELL, BECAUSE IT'S NATURAL TO HUMAN BEINGS. IT'S PART OF OUR BUILT-IN CULTURAL IDENTITY. AND WE MUST USE LOCAL LANGUAGES FOR LOCAL MESSAGES.

THE OTHER PROBLEM IS THAT ROMANIZED CHARACTERS ARE DIFFICULT TO MANY, MANY PEOPLE, ESPECIALLY IN ELEMENTARY SCHOOLS OR PEOPLE WITH LESS EDUCATION.

I THOUGHT AT ONE STAGE THAT THE DIGITAL DIVIDE WILL EXPAND TREMENDOUSLY IN A RUNAWAY FASHION BECAUSE E-COMMERCE WILL TAKE OFF IN A BIG WAY. AND TO A CERTAIN WAY IT DID DURING THE DOT-COM FEVER. AND THAT THE MULTILINGUAL MASSES URGENTLY NEEDED THIS TO MOVE FORWARD. BUT AS WE SEE, EVEN IN CHINA, PEOPLE ARE STARTING TO LEARN ENGLISH BECAUSE THEY RECOGNIZE THAT THIS WAS ECONOMICALLY THE BEST BET THEY HAVE IN ORDER TO SUCCEED IN LIFE, BECAUSE AS JOHN MENTIONED EARLIER, LOTS OF PEOPLE USE THE ROMANIZED CHARACTERS. BUT THEN THERE WILL ALWAYS BE THOSE WHO ARE LAGGING BEHIND.

AND IT'S IMPORTANT TO US, ESPECIALLY IN THE CONTEXT OF WSIS, THAT WE ARE TALKING ABOUT AN INFORMATION SOCIETY THAT WE NEED TO BRIDGE THAT DIGITAL DIVIDE, HOWEVER YOU LIKE TO LOOK AT IT. WE NEED TO LOOK AFTER THOSE PEOPLE WHO ARE A LITTLE BIT BEHIND, LAGGING ALONG THE WAY, AND HELP THEM ALONG.

AND THIS HAS BEEN THE GREATEST LIBERATING FORCE OF THE INTERNET, REACHING OUT TO PEOPLES WHO ARE DISENFRANCHISED, REACHING OUT TO PEOPLE WITHOUT THE POWER, WITHOUT THE VOICE, AND EMPOWERING THEM.

SO WHAT BETTER WAY THAN TO COME UP WITH A LABELING SYSTEM THAT WILL ALLOW THESE PEOPLE AT LEAST HAVE A CHANCE TO MOVE FORWARD IN LIFE.

SO THE VISION OF THAT IDN MOVEMENT IS ESSENTIALLY SOLVING THAT FINAL BARRIER, SO TO SPEAK, FOR WIDESPREAD ADOPTION OF THE INTERNET IN NON-ENGLISH-SPEAKING COMMUNITIES.

AND IN THE WAKE OF 9/11, THE IRAQ WAR, SECOND ONE, PROBLEMS IN THE GULF, AND SO ON, ALL THE MORE REASON WE NEED TO BE SENSITIVE TO THE NEEDS AS WELL AS TO THE CULTURAL VALUES OF THOSE PEOPLES WHOM WE ARE TRYING TO HELP IN ORDER TO REDUCE THE DIGITAL DIVIDE, IN ORDER TO GIVE ALL PEOPLES OF THE WORLD THE BEST CHANCE TO SUCCEED IN THE INTERNET WORLD, IN E-COMMERCE, IN THE FUTURE OF THE DIGITAL KNOWLEDGE AGE.

THERE WERE MANY IDN NAYSAYERS. THEY TOLD US IN 1998, "IT IS NOT TECHNICALLY POSSIBLE." BUT WE HAVE PROVEN, MAYBE NOT 100%, MAYBE NOT EVEN 90%, NOT EVEN 80%, SOME SAY WORSE THAN 50%, BUT AT LEAST IT DOES WORK TO A CERTAIN DEGREE. THE NAYSAYERS HAVE TOLD US THERE IS NO DEMAND AND THERE IS NO INTEREST. WE HAVE FOUND THAT THERE WAS PLENTY. ON THE FIRST DAY OF LAUNCH OF CHINESE CHARACTER IDNS, THERE WERE 20,000 PEOPLE SIGNED UP, OR SOMETHING LIKE THAT.

THEN SOME PEOPLE SAID TO US THAT THERE WON'T BE ANY SERVICE PROVIDERS. BUT PLENTY CAME FORWARD. AND AFTER THE DOT-COM CRASH, A LOT OF THEM CRASHED ALONG, TOO. WHILE WAITING FOR IDN TO BE IMPLEMENTED, MIND YOU.

SOME PEOPLE SAID THAT NO ORGANIZATION WILL DO IT. LOOK AT ICANN. IT'S DEALING WITH A LOT OF PROBLEMS TODAY.

IT HAS NO TIME TO DO IDN TODAY. IT HAS TO FOCUS ON GETTING ALL THE CCTLDS TOGETHER. I SEE SOME CCTLDS HERE. RIGHT?

BUT WE HAVE SHOWN THROUGH THE MULTILINGUAL INTERNET NAMES CONSORTIUM THAT IT CAN BE DONE, WITH ALL THE GROUPS THAT WE HAVE SPUN OUT, THAT WE HAVE FOR THE TAMIL GROUPS, THE ARABIC GROUPS, THE CHINESE GROUPS, THE HAN CHARACTER GROUPS.

AND NOW, TODAY, ICANN. IT'S HAPPENING.

THEN SOME PEOPLE SAID, NO, NO, NO, WE CAN'T GO FORWARD WITHOUT STANDARDS. AND WE HAVE DONE SO, AND JAMES IS THE LIVING EXAMPLE OF HAVING STRUGGLED WITH THIS, ALONG WITH JOHN KLENSIN, WITH THE RFCS THAT YOU WILL HEAR OF LATER THIS AFTERNOON.

THEN PEOPLE CAME ALONG WITH ALL THESE PEOPLE IMPLEMENTING THESE RFCS, NO DOUBT THERE IS STANDARD RFCS, BUT THERE WERE MANY WAYS OF IMPLEMENTING AN RFC, YOU KNOW, AND IT MIGHT NOT BE INTEROPERABLE. BUT LAST YEAR WE HAD INTEROPERABILITY TESTING.

THEN THERE WAS THE PROBLEM OF NO LANGUAGE TABLES. THERE ARE PLENTY OF VARIANTS. EVERY LANGUAGE IS DIFFERENT. WELL, WE ARE STARTING TO WORK ON IT. MINC TABLES IS ONE OF THE EXAMPLES. ICANN HAS ALREADY STARTED IT.

BUT MORE IMPORTANTLY HERE, WE ARE BEGINNING TO SEE THAT THE RIGHTS OF THE COMMUNITY SURFACING.

THE ISSUE OF LEGITIMACY THAT HAS BEEN DOGGING THE PAST EVERY SINGLE STEP OF ICANN HAS BEEN TAKING HAS BEEN DOGGED BY ISSUES OF LEGITIMACY. WHAT RIGHT DOES SOMEBODY ELSE SOMEWHERE IN ANOTHER PART OF THE WORLD THAT DOES NOT HAVE -- DOES NOT USE MY LANGUAGE BUT MAY BE AN EXPERT IN MY LANGUAGE IN AN ACADEMIC SENSE, WHAT RIGHT DOES HE HAVE TO TELL ME HOW I SHOULD SPEAK MY LANGUAGE AND HOW I SHOULD WRITE MY LANGUAGE AND HOW I SHOULD WRITE MY SCRIPT?

WHO GETS TO DECIDE ON MY LANGUAGE? WELL, IT HAS TO BE ME AND US. THOSE OF US WHO SPEAK THE LANGUAGE. AND THIS NECESSARILY LEADS TO A CERTAIN DEGREE OF SENSITIVITY THAT WE MUST GIVE TO LANGUAGE EMPOWERMENT GROUPS THAT WILL SELF-ORGANIZE, THAT WE WILL WANT TO ENCOURAGE THEM TO COME FORWARD VOLUNTARILY TO SOLVE THOSE PROBLEMS THAT ARE CLOSE TO THEIR HEARTS. AND EMPOWER THEM. AND THAT IS WHERE THE LEGITIMACY OF AN INTERNATIONAL ORGANIZATION DRAWS ITS RIGHT TO MAKE PRONOUNCEMENTS THAT AFFECT THESE LANGUAGE GROUPS.

FAILURE TO UNDERSTAND THE NEED FOR THAT KIND OF LEGITIMACY MAY LEAD TO FAR MORE DOOM AND DISASTER THAN THE IMPLEMENTATION OF IDN.

SO THEY SAID THERE'S NO VEHICLE TO CARRY THIS FORWARD, BUT TODAY, THERE IS MINC, THERE'S ICANN, THERE'S ITU, THERE'S THE WSIS PROCESS, U.N, ICT TASK FORCE, ET CETERA, ET CETERA, ANY OF WHICH COULD TAKE ON THIS BALL AND RUN WITH IT FOR THE NEXT MILE.

WE HAVE LIT THE TORCH, 1998. WE HAVE RAN WITH IT IN MULTIPLE SECTORS. FROM THESE FALLING HANDS, SOMEBODY HAS TO PICK THIS UP, PICK UP THIS TORCH AND CARRY THE RACE FORWARD TO THE NEXT LAP. WHO WILL THAT BE?

CAN ICANN? I CAN OR I CANNOT DO IT. THE CHALLENGES ARE NO LONGER TECHNICAL. IT HAS MOVED INTO THE REALM OF POLICY, AND POLITICS. IT REQUIRES THE SKILLS OF DIPLOMACY AND THAT OF STATESMANSHIP, FAR BEYOND THE ENGINEERING OR THE TECHNICAL.

THANK YOU VERY MUCH.

(APPLAUSE.)

>>SHARIL TARMIZI: THANK YOU, PROFESSOR TAN.

>>JOHN KLENSIN: AT LEAST THE SPEAKERS, AND PROBABLY SOME OF YOU, COULD USE ABOUT A 15-MINUTE BREAK. SO LET'S TRY TO TAKE IT NOW. WE HAVE A LOT OF MATERIAL TO COVER, SO PLEASE, LET'S PLAN ON OUR GETTING STARTED IN ABOUT 15 MINUTES.

(BREAK)

>>JOHN KLENSIN: WE HAVE A TREMENDOUS AMOUNT OF MATERIAL TO GET THROUGH SO WE'RE GOING TO START AND HOPE THE PEOPLE MILLING AROUND THE HALLS WILL NOTICE AND COME RUNNING IN.

BEFORE I TURN THE FLOOR OVER TO JAMES, I WANT TO ANSWER A QUESTION THAT I'VE BEEN ASKED SEVERAL TIMES AND ASK A FAVOR OF YOU. THE QUESTION IS ARE THESE SLIDES AVAILABLE, AND THE ANSWER IS THEY WILL BE UP ON THE ISOC WEB SITE AT ABOUT NOON TODAY. AND I'LL GET A URL UP AS SOON AS I HAVE IT IN HAND.

AND THE REQUEST FOR YOU IS THAT I SOMETIMES LOOK AT HOW MUCH MATERIAL WE HAVE TO COVER AND THE SLIDES AND START GOING A LITTLE BIT TOO FAST EVEN FOR NATIVE ENGLISH SPEAKERS. AND IF YOU FIND THAT I'M ACCELERATING BEYOND YOUR ABILITY TO UNDERSTAND WHAT I AM SAYING, PUT YOUR HANDS IN THE AIR AND START WAVING THEM OR SOMETHING AND I'LL TRY TO SLOW DOWN.

AND MY APOLOGIES FOR SPEEDING UP THIS MORNING WITHOUT NOTICING AND WITHOUT GIVING YOU THAT WARNING BEFORE I GOT STARTED.

JAMES.

>>JAMES SENG: GOOD MORNING, EVERYONE. HMM.

I JUST WONDER, HOW MANY PEOPLE WAS HERE IN 2000? WHEN I FIRST GAVE THE PRESENTATION ON IDN IN L.A. CAN YOU PLEASE RAISE YOUR HAND? WOW, THAT IS QUITE FEW.

I KNOW SOME OF YOU MIGHT -- THOSE WHO HAVE BEEN FOLLOWING THIS FOR MANY TIMES, PROBABLY HEAR ME MANY TIMES, AND YOU DO SEE ME REUSE MY SLIDES AGAIN AND AGAIN, BUT I PROMISE THIS TIME THERE WILL BE TOTALLY NEW SLIDES, SO WHATEVER YOU SEE TODAY WILL BE A FIRST TIME BECAUSE I JUST WROTE THEM LAST NIGHT ON THE TRAIN HERE TO KL.

SO LET ME GET STARTED.

MOST OF YOU WILL REMEMBER ME AS THE FORMER CO CHAIR OF THE IETF/IDN WORKING GROUP BUT I'M NOW CURRENTLY THE ASSISTANT DIRECTOR OF (INAUDIBLE) DEVELOPMENT AUTHORITY. BUT TODAY I'M HERE IN MY OWN PERSONAL CAPACITY SO ANY OPINION I EXPRESS HERE TODAY MAY OR MAY NOT BE IETF'S POSITION. I JUST WANTED TO GET MY HAT CORRECT IN THIS.

AND I WAS HAVING A DISCUSSION WITH JOHN ON HOW TO DO THIS SESSION, AND I WAS ASKED SPECIFICALLY TO GO INTO A BIT OF DETAIL ON HOW THE IETF IDN FUNCTIONS ACTUALLY WORKS. SO I'M GOING TO GIVE MAYBE, SAY, 15 SLIDES ON HOW DO YOU ACTUALLY GET INTERNATIONALIZED DOMAIN NAME WORKS, AND MAYBE A LITTLE BIT OF DISCUSSION ON WHAT NEEDS TO BE DONE FOR THE NEXT STEP. I WON'T GO INTO THE BACKGROUND BUT LET'S JUMP INTO THE GIST OF THE PROBLEM.

THERE ARE MANY -- MANY, IGNORING THE SIDE TRACKS, BUT THERE ARE MANY THREE DIFFERENT PRONGS AS FACED BY THE IETF WHEN WE WERE DESIGNING THE IDN WORK. THE FIRST IS HOW DO WE ACTUALLY ENCODE NON-ASCII CHARACTERS INTO THE DNS? DOMAIN NAMES HAS TRADITIONALLY BEEN SERIAL NINE AND YOU WANT TO PUT HEBREW, ALL OF THOSE INTO THE DNS STRING, HOW DO YOU ACTUALLY ENCODE THEM INTO THE DNS?

THERE IS ALSO A TRADITION OF SAYING UPPER CASE AND LOWER CASE ARE EQUIVALENT. SO HOW DO WE DEAL WITH THIS EQUIVALENT CHARACTERS IN INTERNATIONALIZED DOMAIN NAMES? AND LASTLY, HOW IT ACTUALLY WORKS. THERE ARE MANY PROPOSALS. SOME SAY LET'S DO IT THIS WAY, LET'S CHANGE THE DNS SYSTEM, I SAY NO LET'S NOT CHANGE THE DNS, LET'S CREATE A NEW SYSTEM TO DO THE IDN, SO ON AND SO FORTH. THESE ARE THE MAIN THREE DEBATES THAT RAN OVER THREE YEARS IN THE IETF ON HOW DO WE RESOLVE THE PROBLEM.

SO I'M HERE TO PRESENT -- I WON'T SAY A SOLUTION BUT HERE IS REALLY PRESENTING WHAT THE IETF HAS COME UP WITH AFTER THE THREE YEARS OF WORK.

SO LET ME JUMP IN WITH THE FIRST PART. THE FIRST PART IS ENCODING.

AS I MENTIONED BEFORE, THE TRADITIONAL OLD PROTOCOL HANDLED EIGHT OH TO NINE, BUT THE MAIN THING IS DOMAIN NAMES ARE NOT ONLY USED FOR WEB SURFING. THIS IS ONE THING MANY PEOPLE IGNORE. MOST PEOPLE SAY I CAN TYPE IN THIS ON MY WEB BROWSER, THAT'S DOMAIN NAME FOR ME. BUT NO, DOMAIN NAME IS A VERY IMPORTANT PIECE OF INTERNET. WEB SERVER, INSTANT MESSAGING, SO ON AND SO FORTH, SO WHEN WE FIRST INTRODUCE INTERNATIONALIZED DOMAIN NAMES, WE CANNOT LOOK AT THE PROBLEMS ONLY FOR THE WEB SURFING. THERE WILL BE LAYER VIOLATIONS ON WHAT YOU ARE TRYING TO DESIGN.

I THINK WE'RE ALSO AWARE OF IS PEOPLE WANT THEIR OWN LANGUAGE, LANGUAGE, IN THE DOMAIN NAMES. THIS GETS TO BE A PROBLEM BECAUSE THERE IS A CONFUSION BETWEEN LANGUAGE AND SCRIPTS, WHICH JOHN TALKS ABOUT THIS MORNING.

A LOT OF PEOPLE DON'T REALLY UNDERSTAND THE DIFFERENCES. MOST PEOPLE SAY I SPEAK ARABIC, I WRITE ARABIC SCRIPTS. BUT WHAT, DO YOU REALIZE ARABIC SCRIPTS ARE USING DIFFERENT LANGUAGES FROM PERSIAN, TO FARSI TO JAWI. YOU SAY I'M CHINESE, I WRITE CHINESE SCRIPTS, BUT IT'S USING JAPAN, KOREA, AND ALSO VIETNAMESE, THE OLD WRITING WHICH THEY CALL CHONAN, IT'S ACTUALLY IN CHINESE. CHINESE SCRIPT. SORRY. SO THERE IS A CONFUSION BETWEEN SCRIPT AND LANGUAGE.

SO WHEN PEOPLE SAY I WANT MY OWN LANGUAGE, THEN THAT BECOME -- YOU HAVE TO LISTEN, ARE YOU REALLY WANT YOUR OWN SCRIPTS OR YOU WANT YOUR OWN LANGUAGE IN THE DOMAINS?

I ALREADY TOLD YOU ABOUT THE DEBATES ABOUT THE PROPOSAL THAT'S BEEN PUT FORWARD BY THE IETF IS THE RFC 3492 OR PUNYCODE. JOHN MENTIONED BEFORE, IT SOUNDS PUNY BUT IT'S A CUTE NAME THAT WE DECIDE IT SOUNDS FOR THIS ENCODING. IT BASICALLY USES UNIQUE CODE. FUNDAMENTALLY, IT DERIVE ITS CHARACTER SET FROM UNICODE. THE LIMITATION OF COURSE IS UNICODE IS A SCRIPT-BASED ENCODING. IT'S SCRIPT BASED, NOT A LANGUAGE BASED.

IF YOU LOOK A LOT OF THE LOCALIZED ENCODING SCHEMES, LIKE A 51, ISO 59 DASH ONE AND TWO, OR IN CHINA, GB 30, THEY ARE LANGUAGE ENCODING SET. HOWEVER, UNICODE IS A SCRIPT. IT'S A SCRIPT-BASED ENCODING. SO BY USING UNICODE, IT ENABLES US TO ENCODE SCRIPT IN THE DNS, NOT LANGUAGE. PLEASE BEAR IN MIND THIS LIMITATION WHICH I WILL COME TO LATER.

PUNYCODE IS ALSO WHAT WE CALL ASCII COMPATIBLE ENCODING. WHAT IT MEANS IS YOU TAKE THE UNICODE 3, WHICH MAYBE UTF-8, UTF-16, IF YOU PUT IT ON THE SCREEN IT'S A BUNCH OF STRINGS BUT YOU TRANSFORM THEM INTO A STRING OF ASCII. SO FOR EXAMPLE, I HAVE THE THREE CHINESE CHARACTER, (INAUDIBLE), WHICH IS OVER THERE. IN SOME ENCODING, UNICODE, AND THEN YOU TRANSFORM THEM TO A PUNYCODE WHICH WILL GET YOU A STRING LIKE XN DASH BS 3 AW DOT DOT DOT DOT; RIGHT? NOW USING PUNYCODE, WE HAVE A METHOD TO ENCODE THIS INTERNATIONALIZED DOMAIN NAMES INTO THE DNS. AND THAT DOESN'T REQUIRE A CHANGE OF THE DNS INFRASTRUCTURE. THE INFRASTRUCTURE REMAINS INTACT.

IF WE WERE TO USE OTHER ENCODING SCHEME THAT MAY REQUIRE ALTERING ALL OF THE DNS SYSTEM, LIKE YOUR NAME SERVER, HOSTING SERVER, AND THAT WOULD BE A BIG DISRUPTION. BY USING A PUNYCODE, IN A WAY YOU'RE ACTUALLY ENCAPSULATING THE INTERNATIONALIZED CHARACTERS INTO THE ASCII SET, AND YOU OVERCOME THAT PROBLEM NOW ALLOWING YOU TO ENCODE INTERNATIONALIZED DOMAIN NAMES WITHOUT CHANGING THE INFRASTRUCTURE.

THE REASON FOR THIS IS FOR THE MINIMUM DISRUPTION TO THE INFRASTRUCTURE AS POSSIBLE.

AM I GOING TOO FAST? I HOPE NOT. OKAY. THIS ONE, I WILL GO A BIT SLOWER, THE SECOND PROBLEM ABOUT EQUIVALENT CHARACTERS BECAUSE THIS IS THE PART THAT'S MOST INTERESTING. I HAVE A FEW SLIDES ON THIS.

AS I FIRST MENTIONED, DOMAIN NAMES ARE TRADITIONALLY CASE INSENSITIVE. SO IF YOU RECEIVE AOL.COM IN UPPER CASE, YOU KNOW THAT IT'S ACTUALLY EQUIVALENT TO AOL.COM IN LOWER CASE. THAT'S WHAT WE'VE BEEN USED TO. IT DOESN'T MATTER, TYPE IN UPPER CASE OR LOWER CASE; THIS IS HOW IT WORKS. SO WHEN WE HAVE INTERNATIONALIZED DOMAIN NAMES THERE IS CERTAIN USER EXPECTATION THAT, SNAP, THAT'S HOW IT WORKS. IT SHOULD WORK; RIGHT?

SO THE CONCEPT OF UPPER CASE, LOWER CASE IS SIMPLE IN ENGLISH ALPHABETS, THERE'S ONLY 26 OF THEM BUT WHEN IT COMES TO NON-ENGLISH CHARACTERS IT BECOMES MORE INTERESTING. LIKE A WITH TWO DOT ABOVE IS EQUIVALENT TO A WITH TWO DOT ABOVE? YES, IT IS. BUT WHAT ABOUT "I"? IS I EQUAL TO I UPPER CASE OR I EQUAL TO I UPPER CASE WITH A DOT? ANYBODY SEEN AN I UPPER CASE WITH A DOT? ANYBODY KNOWS WHAT IS THE EQUIVALENT? I'M SURE YOU KNOW, SO JOHN AND MARTIN DURST -- MARTIN DURST IS HERE. HE'S THE GUY THAT WROTE THE INTERNET DRAFT BACK IN 1996 THAT STARTED IDN WHICH I DID MY WORK UPON.

I WITH A DOT ABOVE ACTUALLY IS USED IN TURKISH, BECAUSE TURKISH HAS A CHARACTER IS I WITH A DOT, WHICH IS MEANT TO I UPPER CASE WITHOUT A DOT. SO I WITH A DOT IS EQUIVALENT TO I UPPER CASE WITH A DOT. SO CASE FORWARDING DOESN'T MAKE SENSE NOW; RIGHT? UPPER CASE I AND LOWER CASE I IS NOT THE SAME DEPENDING ON THE LANGUAGE. IF YOU ARE ENGLISH, YES. THAT MAKES SENSE. THE FIRST ONE MAKES SENSE. BUT IF YOU ARE TURKISH, THEN YOU HAVE TO HAVE I WITH AN UPPER CASE DOT TO MATCH TO I WITH AN UPPER CASE AND A DOT, AND SO ON. SO THIS BECOME A BIT CONFUSING WHEN IT'S NOT ENGLISH.

THERE'S ALSO DIFFERENT WAYS TO REPRESENT THE SAME CHARACTERS. I GO WITH LATIN. LATIN IS MORE INTERESTING. LIKE I SHOW A WITH TWO DOT ABOVE. THERE ARE ACTUAL TWO WAYS OF REPRESENTATION -- ACTUALLY, MORE THAN TWO WAYS BUT LET ME SHOW TWO WAYS I KNOW. IT'S KNOWN A COMPOSITE FORM, YOU PUT THEM INTO SINGLE CHARACTERS, IT'S REPRESENTED BY UNICODE. BUT TRADITIONALLY THERE IS ALSO A WAY TO KEY IN A WITH A TWO DOT BECAUSE ON THE KEYBOARD YOU ACTUALLY PRESS AN A AND TWO DOTS, AND THAT IS DECOMPOSED AS UNICODE 0061,0308. SO THESE ARE DIFFERENT BUT THEY HAVE TWO DIFFERENT UNICODE STRINGS BUT THEY NEED TO BE TREATED EQUAL.

WHAT ABOUT SIMILAR LOOKING CHARACTERS? I SHOW ABOVE ICANN. CAN YOU SEE THE DIFFERENCES BETWEEN THE TWO ICANN? IMAGINE THAT AS A DOMAIN NAME. ICANN DOT OUGHT. ONE IS IN PURE ASCII, ONE IS INTERNATIONALIZED DOMAIN NAMES WITH A CYRILLIC A, CAPITAL LETTER A. I WONDER HOW MANY PEOPLE CAN SPOT THE DIFFERENCES?

ANYONE CAN SPOT THE DIFFERENCE? OKAY.

I HAVE ONE LITTLE HAND UP THERE. SO I WILL TAKE IT AS MOST PEOPLE WOULD BE ABLE TO SPOT THE DIFFERENCE. WE TALK ABOUT SCAN, WE TALK ABOUT FISHING, IMAGINE SOMEONE RECEIVE AN E-MAIL FROM ICANN, WHICH IS NOT REALLY ICANN.

VARIANTS. THAT'S ANOTHER PROBLEM.

LET'S TAKE THE FIRST ROW, THE TWO CHINESE CHARACTERS. ANYONE CAN SPOT THE DIFFERENCE? IF YOU TELL ME YOU CAN SPOT THE DIFFERENCE, YOU ARE LYING BECAUSE TECHNICALLY THEY ARE THE SAME CHARACTERS. THEY ARE TECHNICALLY THE SAME CHARACTERS BUT ONE IS THE (INAUDIBLE) FORM OF THE OTHER. BUT THEY HAVE TWO DIFFERENT CODE POINT IN UNICODE, ONE IS F 900 AND THE OTHER IS 8 C4 8.

THERE'S DIFFERENT EVOLUTION OF FORMS, LIKE THE NEXT ONE, THE FIRST ONE IS USING TAIWAN, THE WAY IT'S WRITTEN, THE OTHER WAY IS THE WAY JAPANESE WRITE THE SAME CHARACTERS. TWO DIFFERENT CODE POINT AGAIN.

TRADITIONAL, SIMPLIFIED, ALL OF THIS. TRADITIONAL FORM THE FIRST PART, SIMPLIFIED FOR THE SECOND ONE IN JAPANESE. THIRD ONE IS SIMPLIFIED CHINESE, USED IN CHINA BUT THEY ARE CONSIDERED EQUIVALENT BECAUSE THEY BOTH REPRESENT TWO.

THERE ARE ALSO WAYS OF DECOMPOSING AND CONSTRUCT CHINESE CHARACTERS IN UNICODE. SO WE HAVE THE CODE POINT LIKE (INAUDIBLE), WHICH IS THE FIRST CHARACTER YOU SEE. IT CAN BE DECOMPOSED AND REPRESENTING UNICODE REPRESENTING 2 FF 19 (INAUDIBLE).

IF YOU ACTUALLY TYPE IT IN THE COMPUTER SCREEN AND PUT THE THREE TOGETHER, IT SHOULD COMPOSE FOR YOU TO REPRESENT 9B03. THAT IS WHAT WE CALL THE IDIOGRAPHIC DESCRIPTIVE LANGUAGE IN UNICODE. THAT IS NOT A PROBLEM.

THANK YOU. I CAN GO ON. THERE ARE MANY, MANY DIFFERENT EXAMPLES OF THIS IN UNICODE. BUT I'M NOT HERE TO GIVE YOU A PREMIUM OF ALL THE LANGUAGES. I CAN'T CLAIM I KNOW ALL THE LANGUAGES, WHICH IS WHY WE HAVE THE AFTERNOON SESSION TO TALK MORE ABOUT THE DIFFERENT LANGUAGE ISSUES.

I'M HERE TO GIVE YOU A LITTLE BIT OVERVIEW ON THE PROBLEMS THAT WE FACE IN IETF. AND ESPECIALLY CONSIDERING WE DO NOT HAVE ALL THE NECESSARY EXPERTS WITHIN THE WORKING GROUP ON LANGUAGE ISSUES. WE'RE JUST STICKING TO THE UNICODE AND THE SCRIPT IS A REALLY GOOD IDEA.

WHICH GETS ME TO THE SECOND PART, WHICH IS THE SOLUTION. AFTER THREE AND A HALF YEARS WE CAME UP WITH RFC 3491, CALLED NAME PREP. BASED ON UNICODE NORMALIZATION FROM KC. IT'S A TECHNICAL DOCUMENT, PUBLISHED BY UNICODE CONSORTIUM. YOU LOOK UP UTR 15, THAT'S WHAT YOU CAN GET AND OF COURSE THE CASE FOLDING DOCUMENT CALLED UTR 21. I THINK 21 HAS BEEN DEPRECIATED BUT IT WAS A DOCUMENT.

THIS INTERNET DRAFT WAS WORKED JOINTLY HALF FROM IETF, HALF FROM THE UNICODE CONSORTIUM. WE HAVE THE PRESIDENT OF UNICODE CONSORTIUM MARK DAVIS INVOLVED AND GIVING INPUT ON THIS INTERNET DRAFT -- SORRY, RFC. NOT INTERNET DRAFT ANYMORE. RFC.

THE GOAL OF THIS, IT ACTUALLY TRIES TO MINIMIZE THE CONFUSION AND TO GIVE THE HIGHEST CHANCE OF GUESSING THE DOMAIN NAME RIGHT. THIS IS WHAT (INAUDIBLE) AT THAT TIME SAYS, THE LAW OF LEAST ASTONISHMENT, IF I KNOW HOW TO PRONOUNCE IT CORRECTLY. IF YOU'RE KEYING A DOMAIN NAME, YOU TRY TO GUESS WHAT THE BEST INTENT OF WHAT THE USER IS TRYING TO GET TO, AND YOU MAKE SURE THAT YOU RETURN THE PROPER RESULT TO THEM.

THE KEY HERE IS HIGHEST CHANCE. HIGHEST CHANCE. YOU NEVER SAY IT'S 100 PERCENT ACCURATE. YOU NEVER SAY IT'S PERFECT. IT'S ONLY A PERCENTAGE. IT GIVE YOU THE HIGHEST CHANCE OF GETTING THE DOMAIN NAME RIGHT.

SO THERE ARE SOME LIMITATION. THE LIMITATION NUMBER ONE IS BECAUSE WE ARE USING UNICODE, WE'RE DOING SCRIPTS, THE INTERNATIONALIZED DOMAIN NAMES USE WITH INTERNATIONALIZATION. WE DEAL WITH INTERNATIONALIZATION. WE PROVIDE A PLATFORM, A COMMON PROTOCOL WHICH ALL IMAGES -- SORRY, ALL SCRIPTS CAN BE ENCAPSULATED IN THE DNS AND GET THE HIGHEST CHANCE OF (INAUDIBLE). BUT IT DOESN'T DEAL WITH LANGUAGE SPECIFIC ISSUE. IT DOESN'T DEAL WITH ALL THE VARIANTS IN CHINESE. IT DOESN'T DEAL WITH THE TURKISH "I." IT DOESN'T DEAL WITH ARABIC. IT HAS IT'S ON FAVORITES DEPENDING ON WHAT YOU WANT TO USE FOR, THE POINTS AND DOTS ARE OPTIONAL IN ARABIC FORM. WE DON'T DEAL WITH THAT.

THE POINT HAS BEEN DEALT WITH IN THE NAME PREP, I THINK. I HAVE TO LOOK IN THE PAPERS. THE PAPERS ARE ABOUT 40, 50 PAGES. THE GIST OF THE DOCUMENT IS QUITE THIN. IT'S ONLY A FEW PAGES. WE PUBLISH TABLES OF HOW YOU ACTUALLY DO THE ENCODING SO YOU CAN PUT IN THE PROGRAM LANGUAGE SO IT'S NOT A VERY COMPLICATED DOCUMENT BUT I WOULDN'T SAY IT SOLVE ALL THE PROBLEM THAT THE USER IS EXPECTING.

THE OTHER PROBLEM, LIMITATION OF NAME PREP IS THAT WHAT THE USER REGISTER IS NOT NECESSARILY WHAT IS BEING REGISTERED INTO THE DNS. IT SOUNDS A BIT WEIRD. THAT MEANS IF I KEY IN (INAUDIBLE), THE USER THINKS I'M GETTING THAT STRING, BUT ACTUALLY IN THE ZONE FILE, IT'S ACTUALLY REPRESENT XN DASH DASH AS A PUNYCODE. AND SOMETIMES WHEN YOU DO THE REVERSE TRANSFORMATION YOU MAY NOT GET BACK THE ORIGINAL FORM THAT THE USER REGISTER IN THE FIRST PLACE. THIS IS A LIMITATION. IT'S A KNOWN LIMITATION. WE LIVE WITH IT. WE DECIDE TO GO AHEAD WITH IT.

IN CHINESE, PROBABLY NOT THAT MUCH. MAYBE YOU DO. MAYBE I KEY IN SOME OF THAT IN COMPATIBILITY FORM BUT WHEN I GET BACK, I GET THE NORMAL CHARACTERS FOR CHINESE.

SO IT MAY CAUSE CONFUSION BUT HOPEFULLY IT DOESN'T. FOR EXAMPLE, IF I WERE TO REGISTER UPPER CASE CHARACTERS IN INTERNATIONALIZED DOMAIN NAMES BUT ACTUALLY WHAT YOU GET BACK IN THE ZONE FILE IS A LOWER CASE. IT'S IMPORTANT FOR THE (INAUDIBLE) TO RECOGNIZE THIS AND TO TELL THE USER WHAT YOU REGISTRY IS NOT NECESSARILY WHAT YOU GET IN THE ZONE FILE.

AFTER WE SOLVE THAT TWO PROBLEM, IT'S HOW DO WE PUT THEM TOGETHER? WE KNOW HOW TO DO ALL THIS NORMALIZATION AND HOW TO MAKE SURE THE SOMEWHAT EQUIVALENT CHARACTERS HAVE BEEN MATCHED, UPPER CASE, LOWER CASE AND ALL THE NORMALIZATION. WE ALSO KNOW HOW TO ENCODE THAT INTO DNS, AND THE NEXT DEBATE IS SHOULD WE DO IT ON THE SERVER SIDE OR CLIENT SIDE OR AS A PROXY OR DIFFERENT MECHANISM?

AFTER 20 YEARS, THE CONSENSUS IS THAT WE GO WITH WHAT WE CALL IT IDNA, INTERNATIONALIZED DOMAIN NAME IN APPLICATIONS. IT'S RFC 3490:

SO THIS IS A BLOCK DIAGRAM THAT LITERALLY SUMMARIZES WHAT IT DOES. IT PUTS ALL THE TWO INTERNET DRAFT TOGETHER. YOU HAVE A USER ON THE TOP AND YOU HAVE SOME APPLICATION SERVER ON THE BOTTOM. WHAT IT CHANGES IS THE UPPER CASE STRING ITSELF. THIS IS THE WHOLE APPLICATION.

IT INTRODUCE TWO PROCESS -- ONE PROCESS, WHICH IS KNOWN AS NAMEPREP, RFC 3491, WHICH I TALK ABOUT JUST NOW. AND, OF COURSE, IN THE PUNYCODE PROCESSION, WHAT IT DOES IS IF THE USER ISSUED DOMAIN NAMES, THE APPLICATION INTERCEPT IT IT DOES A NAME PROCESS AND THEN CONVERTS IT TO PUNYCODE BEFORE IT SENDS UP TO THE DNS RESOLVER FOR RESOLVING OF THE NAMES. IT'S A VERY BEAUTIFUL SYSTEM IN THE SENSE THAT IT DOES NOT CHANGE THE DNS INFRASTRUCTURE AGAIN.

NOTICE, I EMPHASIZE, WE DO NOT CHANGE THE DNS INFRASTRUCTURE. WE ARE TRYING TO MAKE SURE THE INTERNET IS AS STABLE AS WE CAN. BUT MINIMIZING CHANGES TO THE CODE INFRASTRUCTURE.

THE APPLICATION IS -- BUT IT NEEDS TO DO IS APPLICATION NEEDS TO BE UPGRADED TO BE IDNA AWARE. OKAY? THIS IS HOW WE PUT THEM TOGETHER. VERY SIMPLE. ALL RIGHT? JUST GOES DOWN.

THE BEAUTY OF THIS SYSTEM IS THE USER IS NOT AWARE THAT THEY ARE USING IDN.

THE APPLICATION SERVERS AND THE DNS DOESN'T KNOW THERE IS ACTUALLY INTERNATIONALIZED DOMAIN NAMES. SO THERE IS VERY FEW SYSTEMS THAT NEED TO BE CHANGED. ONLY THE APPLICATION. THAT IS WHAT WE CALL MINIMUM DISRUPTION.

AND THE PART -- THE INTERNET DRAFT -- I'M SORRY, THE RFC WAS PUBLISHED IN MARCH 2003. SO I THINK WE ARE ONE YEAR DOWN THE ROAD NOW. AND I AM GOING TO SAY THERE ARE QUITE A LOT OF SOFTWARE THAT ACTUALLY SUPPORT IDNA TODAY. FOR EXAMPLE, MOZILLA 1.4 AND NETSCAPE 7.1 HAS SUPPORT FOR IDN. SO HAS SAFARI, OPERA. VERISIGN HAS BLOCKING. AND I AND A COUPLE FRIENDS HAS A PROJECT TO DO OPEN SOFT PLUG-IN FOR THIS. THESE ARE NOT -- SHORTLY, THERE IS ACTUALLY A LOT MORE SOFTWARE THAT SUPPORTS IDN, INCLUDING THINGS LIKE NOT WEB SURFING RELATED, JABBER HAS A SUPPORT FOR IDNA.

AND WHAT DOES THIS MEAN? WELL, NOW WE HAVE, ACTUALLY, A STANDARD.

IT'S A PROPOSED STANDARD IN IETF TERMINOLOGY, BUT THERE IS A STANDARD OUT THERE NOW THAT IS THE LEAST DISRUPTIVE STANDARD NOW TO DO IDN. AND NEITHER THE DNS NOR THE APPLICATION NEEDS TO UNDERSTAND UNICODE OR UNDERSTAND ACTUALLY I AM DEALING WITH IDN. THAT IS REALLY -- SO WE DO NOT NEED TO CHANGE TOO MUCH OF INFRASTRUCTURE. THE ONLY CATCH IS THAT THE APPLICATION NEEDS TO BE UPGRADED TO BE AWARE. THAT'S THE BIG PART.

IDNA DEALS WITH SCRIPTS, AS I MENTIONED. REMEMBER?

BUT IN THE VERY FIRST LINE, I TALK ABOUT THE USER ASPECT LANGUAGE THE ASPECT I USE MY OWN LANGUAGE. I WANT CHINESE DOMAIN NAME, I WANT ARABIC DOMAIN NAME. I WANT HAN IDEOGRAPHIC DOMAIN NAME. NO USER TELLS YOU I WANT HAN -- THEY SAY I WANT CHINESE DOMAIN NAME BECAUSE THAT'S WHAT I USE.

THE OTHER THING IS IDN INTERNATIONALIZATION, BUT USER EXPECT LOCALIZATION. IF I AM CHINESE, I WANT CHINESE DOMAIN NAMES IN CHINESE AND SIMPLIFIED CHINESE DOMAIN NAMES WORKS. IT JUST WORKS.

THE PROBLEM IS IDN REQUIRES THE APPLICATION TO BE UPGRADED. I THOUGHT ABOUT THE MINIMUM DISRUPTION. THE GOOD NEWS IS WE DON'T NEED TO UPGRADE THE APPLICATION AND THE INFRASTRUCTURE TO DO IT BUT WE STILL NEED TO UPGRADE THE APPLICATION. BUT ON THE END USER, THE USER REALLY EXPECT IT TO JUST WORK. WHAT SHOULD I UPGRADE? I WANT TO KEY IN, PRESS "ENTER" AND IT WORKS.

THERE IS A DISPARITY BETWEEN THE USER EXPECTATION AND WHAT COMES UP. THERE IS STILL A GAP THAT WE NEED TO BRIDGE BETWEEN THE TWO. SO I MAY PROPOSE THE NEXT STEP WHICH I THINK WILL BE USEFUL.

ONE IS THAT LOCALIZATION SHOULD BE DONE, ANY LOCALIZATION WHICH THE USER EXPECTS SHOULD BE DONE AT THE LOCAL REGISTRY. IF ARE YOU A CCTLD PROVIDER, YOU SHOULD START THINKING ABOUT THESE, AND LOCALIZATION MAY INVOLVE LINGUISTIC ISSUES.

WE WILL TALK ABOUT THAT MORE IN THE AFTERNOON SESSION, TALK ABOUT THE (INAUDIBLE) AT THE JET, WHICH, BY THE WAY, WAS NOT INVOLVING ME AT ALL.

I AM NOT A MEMBER OF. I NEVER REMEMBER IF MINC HAS ANY INVOLVEMENT IN JET. BUT WE HAVE (INAUDIBLE) IN JET WHICH WE TALK ABOUT MORE ABOUT THE CJK. AND, OF COURSE, ICANN HAS A REGISTRY FOR LANGUAGE TABLES, WHICH WE SHOULD -- IF WHEN YOU FINISH, YOU SHOULD REGISTER YOURSELF.

THIS IS ONE THING THAT I THINK THE CCTLD OR TLD OPERATORS IN GENERAL SHOULD START LOOKING AT. THE OTHER THING IS AWARENESS AND ADOPTION.

AWARENESS IN THE FIRST SENSE IS EDUCATING THE USER THAT NOT EVERYTHING IS JUST SIMPLE. YOU MAY INVOLVE UPGRADE APPLICATION OR INSTALLING A PLUG-IN IN ORDER TO WORK. IF YOU ARE LUCKY, USER ON MAC OR YOU ARE USING LINUX, IT PROBABLY WILL WORK. I HAVE SAFARI 1.3. SO IT WORKS PRETTY WELL. BUT IF ARE YOU USING INTERNET EXPLORER, YOU PROBABLY NEED TO INSTALL A PLUG-IN. I DO MICROSOFT.COM WITH A PATCH.

THE OTHER THING, OF COURSE, IS WE NEED TO ACTUALLY ENCOURAGE DEVELOPERS TO USE IDNA. AND I THINK THAT CANNOT BE DONE JUST BY ONE -- I BELIEVE MANY PEOPLE HAVE BEEN DOING THAT. PEOPLE ARE TALKING TO VARIOUS PEOPLE, LIKE MICROSOFT, IN DIFFERENT PARTS OF THE WORLD TO ENCOURAGE THEM TO EMBRACE IDNA. AND, FINALLY, THIS IS WORK FOR ICANN, IDN TOP-LEVEL DOMAIN.

PERSONALLY, I BELIEVE IT'S REALLY NEEDED. WE DO NEED TLD THAT IS IDN, BECAUSE SOME LANGUAGES DON'T MIX WELL WITH ENGLISH. IF YOU HAVE ARABIC STRING AND YOU PUT -- WHICH IS WRITTEN TRITE LEFT AND SUDDENLY INTRODUCE DOT-COM OR DOT SG AT THE BACK, IT BECOMES -- THE RENDERING LOOKS KIND OF (INAUDIBLE) ON THE STRING ITSELF. AND SECONDLY, OF COURSE, IT'S MORE INTUITIVE, BECAUSE YOU ARE ACTUALLY TYPING YOUR OWN LANGUAGE AND YOU DON'T NEED TO TOGGLE BETWEEN DIFFERENT ENCODINGS. HOWEVER, I UNDERSTAND THAT THE POLICY IS VERY DIFFICULT.

ICANN, I REMEMBER, ABOUT ONE AND A HALF YEARS AGO DID COME OUT WITH SOME CONSULTATION ON HOW TO DO THIS. I DON'T REMEMBER ANY CONCLUSION ABOUT THAT.

BUT IT'S A VERY DIFFICULT ISSUE AND THERE ARE A LOT OF POLICY QUESTIONS THAT NEED TO BE ADDRESSED AND TO ANSWER. THE ICANN COMMITTEE SHOULD REALLY DEBATE ABOUT THIS SERIOUSLY.

FOR EXAMPLE, SHOULD WE SET A PRECEDENT THAT THE GTLDS SHOULD GET IT TRANSLATED IDN? DOES THAT MEAN THAT ALL DOT-COM WILL GET DOT (INAUDIBLE) AND ONLY 200 OR 300 LANGUAGES AUTOMATICALLY? I'M NOT SURE. I'M NOT SURE WHETHER THAT'S THE RIGHT PATH. OR IT MIGHT BE. WE DON'T KNOW.

IT'S A PRECEDENT THAT WE NEED TO BE CAREFUL ON WHAT TLDS, WHICH IS -- YEAH, I WILL TALK ABOUT THAT.

I WILL PROBABLY SAY SOMETHING ABOUT -- ANYWAY, HOPEFULLY ONE DAY WE WILL GET THIS. BY THE WAY, I SHOULD NOT SAY ONE DAY. THIS IS A WORKING THING.

I -- THE SAUDINIC POINT OF IMPACT WITH THE ARABIC DOMAIN NAMES, IF I AM NOT WRONG, SULTAN OSAMI, IS HE HERE?

YES. THANK YOU.

>> (INAUDIBLE).

>>JAMES SENG: SO OSAMI WAS THE ONE WHO ACTUALLY HELPED ME TO TYPE THIS IN MY MOZILLA BROWSER AT THAT TIME. AND, BINGO, THIS WEB SITE POP UP ON THE BROWSER. IF YOU LOOK AT ETISALAT, IT MATCHES THE DOMAIN NAMES EXACTLY. AND HOPEFULLY WE WILL SEE MORE OF THIS IN MORE APPLICATIONS IN THE FUTURE.

THANK YOU.

(APPLAUSE.)

>>SHARIL TARMIZI: THANK YOU, JAMES. WOULD YOU LIKE TO COME BACK AND JOIN ME OVER HERE?

JOHN, OKAY.

QUITE AN ILLUMINATING PRESENTATION, IF I MAY SAY.

THERE ARE SEVERAL PAPERS, JAMES HAD REFERRED TO SOME OF THE PAPERS ON THE ICANN WEB SITE, SOME OF THE EARLIER PAPERS ON IDN. YOU CAN GO TO THE SITE AND DOWNLOAD THEM. I DON'T HAVE THE SPECIFIC URL.

BUT THERE ARE SOME OF THESE EARLY ATTEMPTS AT TRYING TO CLASSIFY IDNS INTO THE VARIOUS CATEGORIES, SEMANTIC ASSOCIATIONS, LANGUAGE ASSOCIATIONS, AND VARIOUS OTHER TYPES. AND WE HAD A BOX CALLED "OTHERS" WHERE WE THREW EVERYTHING ELSE IN.

JOHN.

>>JOHN KLENSIN: OKAY.

REMEMBER, IF I START SPEAKING TOO QUICKLY, WAVE YOUR ARMS.

TO SORT OF REPRISE WHERE WE ARE AT THIS POINT IN TERMS OF THE DISCUSSION WITH THE COMMENTS I MADE EARLIER AND WITH THE REMARKS FROM PROFESSOR TAN OR FROM JAMES, SOME OF THE EARLY IDN APPROACHES INVOLVE TAKING EXISTING LOCAL CHARACTER SETS RATHER THAN UNICODE AND TRYING TO JUST USE THEM T WORKED WELL AS ALONG AS ONE WAS COMMUNICATING WITHIN A PARTICULAR COMMUNITY AND EVERYONE ELSE WITHIN THE COMMUNITY WAS USING THOSE CHARACTER SETS.

BUT IT CREATED A TAGGING PROBLEM WITH DNS AND CREATED PROBLEMS WITH SOME DNS CLIENT AND APPLICATION IMPLEMENTATIONS.

THE IDNA APPROACH, AS JAMES HAS INDICATED, ENDED UP USING A NAME FORMAT IN THE DNS, WHICH IS COMPATIBLE WITH THE ORIGINAL HOST NAME RULES BUT IS A FORMAT THAT NOBODY USES. AND IT'S A FORM WHICH IS VERY EFFICIENT FOR STRINGS WHICH ARE ALL IN THE SAME SCRIPT AND SUCH THAT THE CHARACTERS ARE CLOSE TOGETHER IN THE UNICODE CODING, WHICH IS NOT A PROPERTY OF THE TRADITIONAL ENCODINGS OF UNICODE.

I MENTIONED EARLIER THAT PART OF WHAT WE NEEDED TO TALK ABOUT TODAY WAS SOME OF THE FUNDAMENTAL PRINCIPLES OF HOW THE DNS WORKS AND HOW IT INTERACTS -- HOW THAT INTERACTS WITH SOME POSSIBLE METHODS OF DOING INTERNATIONALIZED DOMAIN NAMES AND OTHER THINGS.

ONE OF THOSE ISSUES IS THAT PERFORMANCE OF THE DNS DEPENDS VERY STRONGLY ON THE CACHING MECHANISMS WORKING. AND CACHING OCCURS NEAR THE SITE OF THE PERSON MAKING THE QUERY. AND AS A CONSEQUENCE, UNLESS ONE FORCES CACHING NOT TO OCCUR, WHICH HAS PERFORMANCE IMPLICATIONS, THE SITE WHERE THE DATA ARE LOCATED, THE FINAL ZONE WHERE THE DATA ARE LOCATED, HAS VERY LITTLE CONTROL OVER HOUR CACHING WORKS.

SO IF ONE COMES ALONG AND SAYS, OKAY, I'M GOING TO USE A DIFFERENT KIND OF WAY OF MAKING DNS WORK, A DIFFERENT INTERPRETATION OF DNS RULES, AS SEVERAL EARLY POTENTIAL SOLUTIONS DID, THE ONLY PROBLEM IS YOU HAVE TO CONTROL THE CACHES TO MAKE CERTAIN THAT THEY HAVE THE SAME RULES THAT YOU DO. AND THE ONLY WAY TO DO THAT IS ANOTHER ONE OF THOSE DECIDING TO SHUT THE INTERNET DOWN AND REPLACE ALL THE DNS CLIENTS AND SERVERS SITUATIONS.

IN ADDITION TO THAT, ALL THE THINGS WE HAVE BEEN TALKING ABOUT IN OTHER CONTEXTS THAT YOU HAVE BEEN HEARING A GOOD DEAL ABOUT IN OTHER ICANN CONTEXTS, FOR GUARANTEEING THAT THE DATA THAT YOU'RE GETTING BACK FROM DNS IS THE CORRECT DATA, CORRECT INTEGRITY, CORRECT SOURCES, ARE VERY SENSITIVE IN DIFFERENT WAYS, DEPENDING ON THE METHOD, BUT THEY'RE ALL VERY SENSITIVE TO NONCONFORMING HANDLING OF DNS QUERIES.

IF YOU START GETTING INTO SITUATIONS WHERE YOU ASK FOR ONE THING AND SOMETHING ELSE COMES BACK IT HAS AN IMPACT OF NECESSITY ON THOSE SECURITY MECHANISMS, BECAUSE THEY ARE SUPPOSED TO BE TELLING YOU THAT WHAT YOU ASKED FOR AND WHAT YOU GOT AND WHAT IT'S SUPPOSED TO BE, THEY ARE ALL THE SAME.

THERE WILL BE A LITTLE BIT REDUNDANCY BETWEEN THIS AND SOME OF JAMES'S COMMENTS, BUT THERE ARE SOME ISSUES ASSOCIATED WITH THIS SET OF STANDARDS CALLED IDNA. AND THERE ARE SOME ISSUES THIS DOESN'T ADDRESS.

THAT'S THE LATTER WHICH ARE MORE IMPORTANT BUT IT'S USEFUL THAT YOU UNDERSTAND THE FORMER.

I WANT TO STRESS, AS I SAID AT THE BEGINNING OF THE TALK, THAT THIS SET OF STANDARDS, LIKE ANY OTHER ENGINEERING PROBLEM, IS A -- IS THE RESULT OF ANALYSIS OF A SET OF TRADEOFFS AND A SET OF ALTERNATIVES, SET OF COSTS OF DOING THINGS ONE WAY VERSUS ANOTHER WAY.

AND AS FAR AS WE KNOW, THE SOLUTION WE HAVE COME UP WITH IS AT LEAST AS GOOD AS ANY OF THE POSSIBLE ALTERNATIVES AND MUCH BETTER THAN MOST OF THEM.

THE NAMEPREP COMPONENT OF THIS, AS JAMES HAS INDICATED, TAKES SOME LOOK-ALIKE CHARACTERS IN FONT FORMS AND COLLAPSES THEM. IT'S NECESSARY THAT YOU DO THAT. BUT ONE MUCH ITS IMPLICATIONS ARE SOME THINGS THAT YOU MIGHT WANT TO COLLAPSE UNDER SOME CIRCUMSTANCES GET COLLAPSED AND SOME THINGS YOU MIGHT NOT WANT TO COLLAPSE GET COLLAPSED ANYWAY. AGAIN, NO PERFECT SOLUTIONS.

BUT YOU NEED TO HAVE SOME AWARENESS AS YOU'RE LOOKING AT THIS THING THAT CERTAIN THINGS THAT YOU THINK OF AS DIFFERENT CHARACTERS MAY NOT ACTUALLY BE DIFFERENT.

AND THAT'S SOMETHING WHICH HAS TO BE REFLECTED IN USER INTERFACES OR IN REGISTRY POLICY OR IN OTHER THINGS.

NAMEPREP TRIES TO PRESERVE ANALOGIES TO THE CASE MAPPING RULE FOR ASCII AND THE TRADITIONAL DNS.

THE CASE MAPPING RULE SAYS I CAN PUT EITHER UPPER-CASE LETTERS OR LOWER-CASE LETTERS INTO THE DNS IN A REGISTRATION, I CAN USE UPPER CASE OR LOWER CASE IN A QUERY, LOWER CASE MATCHES UPPER CASE.

THERE ARE MANY SCRIPTS AND LANGUAGES FOR WHICH THOSE RULES DO NOT HAVE EXACT ANALOGIES. SOME SCRIPTS DON'T HAVE LOWER CASE. AND THAT'S AN EASY PROBLEM. WE DON'T HAVE TO WORRY ABOUT IT. BUT THERE ARE SOME THINGS WHICH THE UPPER CASE CHARACTERS AND THE LOWER-CASE CHARACTERS MAY NOT EXACTLY MATCH. THEY MAY NOT BE REVERSIBLE.

IDNA PUTS ONLY THE LOWER-CASE CHARACTERS IN THESE PAIRS INTO THE DNS, AND THAT IS USUALLY THE RIGHT SOLUTION. BUT THERE ARE SOME SITUATIONS IN WHICH CHARACTERS EXIST ONLY IN A LOWER CASE AND HAVE NO UPPER-CASE EQUIVALENT. AND WE GET INTO PROBLEMS THERE IF WE TRY TO CUT THINGS TOO FINELY.

AGAIN, SOMETHING THAT REGISTRIES NEED TO UNDERSTAND, THAT REGISTRARS NEED TO EXPLAIN TO THEIR USERS, THAT USERS NEED TO BE AWARE OF OR PREVENTED FROM DOING ANYTHING WHICH WILL ASTONISH THEM WHEN THEY GET BACK THE RESULTS.

I MENTIONED THE GERMAN EXAMPLE EARLIER.

NAMEPREP AND IDNA ACCOMMODATE AND TREAT AS SEPARATE CHARACTERS A, O, AND U UMLAUT, BUT S SET, BECAUSE IT IS ONE OF THESE THINGS WHICH HAS NO UPPER CASE MAPPING, NAMEPREP DECIDES IT'S JUST TWO CHARACTERS IN A ROW AND THAT'S THE END OF IT.

I MENTIONED E-MAIL ADDRESSES BEFORE.

BY AND LARGE, YOU GO TO A USER AND SAY INSTEAD OF USING YOUR NAME OR SOMETHING RESEMBLING YOUR NAME, I WANT YOU TO USE A STRING OF NUMBERS OR SOMETHING INCOMPREHENSIBLE. THE USER DOESN'T LIKE THAT VERY MUCH. WE HAVE ACTUALLY HAD VERY LONG EXPERIENCE WITH THIS.

PEOPLE HAVE COME ALONG AND SAID IT'S MUCH BETTER FROM A SECURITIES STANDPOINT THAT INSTEAD OF HAVING JOHN AT SOME PARTICULAR SITE AS A USER NAME OR AS A MAILBOX NAME, WE SHOULD HAVE M3152. AND THE TYPICAL USER REACTION IS I DON'T WANT TO BE KNOWN AS "M3152." I HAVE A NAME. I SHOULD BE ABLE TO SPELL IT. I SHOULD BE ABLE TO SPELL IT CORRECTLY, SPELL IT IN MY CHARACTER SET. AND WE'VE GOTTEN THE FIRST TWO OF THOSE USUALLY.

BUT WHEN WE COME TO A USER AND SAY, OKAY, WE FIXED THINGS SO THE NAME OF YOUR ISP, WHICH IS SHOWING UP ON YOUR E-MAIL ADDRESS, CAN NOW BE SPELLED IN NATIONAL CHARACTERS, BUT YOU CAN'T SPELL YOUR OWN NAME IN NATIONAL CHARACTERS, THE USER SAYS, "AND WHY IS THAT?" AND WE SAY, A WHOLE LOT OF THINGS ABOUT ENGINEERING WHICH USERS AREN'T INTERESTED IN HEARING.

SO TO A CERTAIN EXTENT, OUR NEXT PROBLEM HAVING TO SOME EXTENT SOLVED THE IDN PROBLEM IS WE HAVE TO SOLVE THE MAILBOX NAME PROBLEM. THAT TURNS OUT TO BE HARDER. IN THE INTERESTS OF TIME -- THAT'S ANOTHER LECTURE. BUT TAKE MY WORD FOR IT, THE MAILBOX NAME PROBLEM IS MUCH HARDER. AND HOWEVER SENSITIVE PEOPLE ARE ABOUT THEIR SCRIPTS AND THEIR LANGUAGES AND THEIR CULTURES, THEY ARE VERY OFTEN MUCH MORE SENSITIVE ABOUT THEIR NAMES.

I'VE TOLD SOME PEOPLE A STORY FOR THOSE OF WHO YOU USE UNIX THAT THE REASON WHY THE BACK SPACE CHARACTER CAUSES OVERSTRIKING AND NOT ERASING IN UNIX AND LINUX IS A CONSEQUENCE OF ONE PERSON INSISTING THAT HIS NAME BE SPELLED CORRECTLY IN AN INTERNATIONALIZED CONTEXT IN 1964. THESE THINGS HAVE EFFECTS ALL OVER. THEY LAST A LONG TIME. INTERNATIONALIZATION IS NOT A NEW PROBLEM.

THE STRICT DEFINITION OF A URL OR A URI DOES NOT PERMIT ANY NON-ASCII CHARACTERS ANYWHERE IN THAT STRING. FIXING THE SPELLING OF THE DOMAIN NAME SO THEY CAN BE WRITTEN IN NON-ASCII CHARACTERS DOESN'T HELP. WE'RE WORKING ON THE PROBLEM, THE EXPERTS IN THE FRONT ROW.

BUT EVEN THERE, THE NAMES OF THE PROTOCOLS, AS I MENTIONED EARLIER, DON'T GO AWAY. WE DON'T FIX HTTP, COLON, SLASH, SLASH, AND IF YOU DON'T WANT TO WRITE ANYTHING IN ASCII CHARACTERS, WE STILL HAVE A PROBLEM. THOSE PROBLEMS CAN BE SOLVED AS A USER INTERFACE ISSUE.

BUT THE MORE WE VIEW THEM AS A USER INTERFACE ISSUE, THE LESS WE NEED TO WORRY ABOUT STANDARDIZATION. BUT THE LESS WE STANDARDIZE THEM, BECAUSE WE'RE DEALING WITH THEM AS A USER INTERFACE ISSUE AND IN LOCAL CHARACTERS, THE MORE WE RUN INTO A PROBLEM IN THE WAY IN WHICH I TYPE IT, A REFERENCE, IN MY ENVIRONMENT USING MY USER AGENTS, THE LESS LIKELY IT WILL BE THAT IF I WRITE THAT REFERENCE DOWN AND SEND IT TO YOU, THE SAME ONE WILL WORK IN YOUR ENVIRONMENT AND YOUR USER ENVIRONMENT.

THAT'S PRECISELY THIS TRADEOFF BETWEEN LOCALIZATION AND CONVENIENCE FOR THE USER ON THE ONE HAND AND A GLOBALLY, EASILY INTEROPERABLE SYSTEM. WE DON'T KNOW HOW TO DO BOTH WELL PERFECTLY AT THE SAME TIME. IT'S A PROPERTY OF THE TRADITIONAL DNS THAT WHAT GOES IN, COMES OUT. THAT MAY SEEM OBVIOUS.

BUT WE TAKE ADVANTAGE IN SOME APPLICATIONS AND ENVIRONMENTS OF THE FACT THAT IN THE ASCII ENVIRONMENT, IF I REGISTER A NAME WITH AN UPPER CASE A IN IT AND I ASK FOR THAT NAME IN A QUERY USING A LOWER-CASE A, THE NAME AS REGISTERED COMES BACK.

WITH IDNA, BECAUSE OF ALL OF THESE MAPPING RULES WHICH OCCUR IN THE APPLICATION, IF I REGISTER AN UPPER-CASE U AT THE REGISTRAR INTERFACE LEVEL, WHAT'S GOING IN THE DNS IS A LOWER-CASE U, REGARDLESS, WITH OR WITHOUT UMLAUTS -- IT'S GOING IN AS U UMLAUT LOWER CASE, BECAUSE THE OHM THING IDNA PERMITS TO BE REGISTERED IS LOWER-CASE CHARACTERS FOR THINGS THAT HAVE BOTH CASES.

AS A CONSEQUENCE OF THE MAPPINGS PERFORMED IN IDNA, IF I THEN COME IN AND ASK FOR UPPER-CASE U, EVERYTHING WORKS FINE, EXPECT UPPER-CASE U UMLAUT. BUT THE LOWER CASE U COMES BACK. IT DEPENDS ON THE APPLICATION WHETHER THE USER IS SENSITIVE TO IT WHETHER IT'S A PROBLEM. BUT WE NEED TO UNDERSTAND THESE ISSUES AND BE PREPARED TO DEAL WITH THEM.

IS THIS A BIG DEAL WITH UPPER AND LOWER CASE U UMLAUT? NO. BUT IT MAY BE A BIG DEAL WITH SOME OTHER SCRIPTS AND OTHER CHARACTERS WHERE THE THINGS WHICH ARE BEING MAPPED TOGETHER REALLY ARE DIFFERENT.

I MENTIONED EARLIER THAT ONE OF THE THINGS THE UNICODE FOLKS DID WAS TO PULL THE CHINESE-BASED CHARACTERS TOGETHER INTO ONE SET OF CODES, TAKE THE EUROPEAN CHARACTERS, WHICH HAD EXISTING STANDARDS WHICH WERE FAIRLY COMPACT, AND KEEP THE STANDARDS TOGETHER, WHICH RESULTED IN THE EUROPEAN CHARACTERS BEING IN SEPARATE CODE SETS AND THE CHINESE CHARACTERS INCOME THE SAME ONES.

IT MEANS WHEN YOU'RE DESIGNING A STANDARD TO WORK FOR UNICODE, YOU HAVE TO HAVE DIFFERENT RULES IN DIFFERENT PLACES. IT'S NOT AN IDEAL SITUATION.

CAN WE WORK AROUND IT? YES.

HAVE WE WORKED AROUND IT? YES.

DOES IDNA WORK AROUND IT REASONABLY WELL? YES.

WE'RE NOT DEALING WITH PROBLEMS WHICH PREVENT US FROM MOVING FORWARD; WE'RE DEALING WITH PROBLEMS THAT YOU NEED TO UNDERSTAND IN ORDER TO DESIGN APPLICATIONS, TALK TO YOUR USERS, DESIGN REGISTRY AND REGISTRAR SYSTEMS AND CONFLICT RESOLUTION SYSTEMS THAT WILL WORK.

IN THEIR INTRODUCTION DEFINITION OF THE STANDARD, THE UNICODE FOLKS SAID THAT UNDER NO CIRCUMSTANCES WOULD THEY MAKE ANY DISTINCTION ON THE BASIS OF THE FONTS IN WHICH CHARACTERS WERE WRITTEN. THEN THEY DID IT.

WE HAVE A LARGE COLLECTION OF THINGS LABELED MATHEMATICAL VARIANTS, WHICH ARE THE BOLD AND ITALIC AND SCRIPT, AND BOLD ITALIC SCRIPT, AND BOLD ITALIC SCRIPT SHADOWED AND OTHER VARIATIONS OF THAT FORM OF THE ORDINARY UPPER AND LOWER CASE LATIN, ROMAN-BASED LETTERS.

NAMEPREP MAPS THEM ALL OUT.

THERE'S AN IMPORTANT QUESTION, IF YOU'RE A REGISTRY AS TO WHETHER YOU WANT TO LET NAMEPREP MAP THEM ALL OUT OR TO SAY TO A USER, YOU DON'T GET TO REGISTER THOSE THINGS, ALL YOU ARE GOING TO DO IS MAKE CONFUSION.

COME IN WITH THE REAL CHARACTER.

THIS IS ONE OF THOSE KINDS OF POLICY DECISIONS, AND ONE OF MANY, THAT IN SOME CASES, BUT VERY FEW, WE'LL NEED TO TRY TO MAKE GLOBALLY AND ACROSS ICANN, BUT EACH INDIVIDUAL REGISTRY, AND IN SOME CASES INDIVIDUAL REGISTRARS, ARE GOING TO HAVE TO MAKE DECISIONS ABOUT WHETHER THEY WILL ACCEPT, IN ORDER TO KEEP THEIR USERS OUT OF TROUBLE AND THEMSELVES OUT OF TROUBLE AND IN ORDER TO PREVENT CONFUSION.

UNIFORM POLICIES WOULD BE A GOOD THING, BUT THEY'RE NOT NECESSARY. THE BIG ADVANTAGE OF UNIFORM POLICIES IS IF WE HAVE DIFFERENT POLICIES FOR DIFFERENT REGISTRIES AND DIFFERENT POLICIES FOR DIFFERENT SITUATIONS, USERS AND REGISTRANTS AND MAYBE REGISTRARS GO CRAZY. BUT AS A MATTER OF TECHNICAL NECESSITY, WE DON'T NEED IT.

WE HAD A LONG BOUT WITH THE CHINESE PROBLEM. JAMES TALKED ABOUT SOME OF IT. PART OF THE DIFFICULTY IS THAT WHEN WE'RE LOOKING AT ALPHABETIC CHARACTERS, WE LOOK AT THEM IN TERMS OF THEIR SHAPES OR SOUNDS BUT NOT THEIR MEANING. AND SIMPLIFICATION AND SOME OTHER ISSUES WITH CHINESE CHARACTERS, THERE ARE EXPERTS ON THIS IN THE ROOM AND I AM PROBABLY GOING TO START EMBARRASSING MYSELF IN A FEW MINUTES, BECAUSE I AM NOT AN EXPERT. BUT THERE ARE MEANINGS AND SEMANTICS ASSOCIATED WITH THESE CHARACTERS. THE MAPPINGS ARE NOT PERFECTLY ONE TO ONE IN ALL CASES. AND BECAUSE THE CHARACTERS ARE USED DIFFERENTLY IN DIFFERENT LANGUAGES, IF WE HAD A SIMPLIFIED, TRADITIONAL CHINESE MAPPING RULE AND IT SIMPLIFIED JAPANESE KANJI INTO SIMPLIFIED CHINESE, EVERYONE WOULD BE VERY UNHAPPY.

SO IT SUDDENLY BECOMES NECESSARY TO UNDERSTAND WHAT LANGUAGE IS IN USE IN THE CHARACTERS IN ORDER TO FIGURE OUT WHAT TO DO WITH THEM IN TER