My professional life revolves around customer data, (not my customers) but the data of the customers of my prospects. Kinda like my friend's sister's husband's dog, except the data that I speak about is extremely important to me, because I can help the prospect make smarter business decisions... quickly and accurately. Think... in REAL time CLEAN & ACCURATE business analytics. The question that I'm about to pose is a challenge that I believe most in the data quality industry will have at some point, considering e-commerce internationalization and the global marketplace that has evolved over the past umpteen years... and not one vendor (to my knowledge) has addressed data internationalization in it's entirety.
For the past couple of months, I've been working on one of the largest projects of my career. The organization that I am servicing is extremely global in nature. Instead of listing the 196 or so countries around the world, picture the entire globe or world map to depict the area that this organization resides in. Did you know that there is over 6500 languages across the world? How do you match, merge, and dedup this data across every language in the world? This poses a challenge for some of the organizations that I'm currently working with, as well as any other global or international organization that cares about their customer data. Who cares about their customer data?! EVERYONE. So, how do you keep a single customer record up-to-date with a confident degree of accuracy?
What makes up a "typical" generic customer record?
Name
Address
Phone #
Email Address
Some sort of customer #
Address formats vary around the world. If you have a subset of data residing in a Japanese database or a French database, can you translate or transliterate this data to your headquarters in the US?
To be clear, I've included the definitions below:
Translate: turn words into different language: to reproduce a written or spoken text in a different language while retaining the original meaning.
Transliterate: transcribe something into another alphabet: to represent letters or words written in one alphabet using the corresponding letters of another.
So why would you want to transliterate the text of an address? Ultimately it wouldn't be valid, right?? No.
Languages such as Japanese, Cantonese, and Mandarin use characters such as "東京都" It's common in many Asian cultures to not be able to say/pronounce a word that has been translated. This is where transliteration comes into play, especially for First/Last Names.
Address standardization, translation, and transliteration is commonplace. There's a number of tools that link the USPS database and other international address databases across the world and provide a service "out of the box." Read this address: "Japan, 東京都中央区築地4丁目7-5" Can this be translated to English? Yes. But is the direct translation how you communicate it through words? I have no idea, and I'm sure your customer service rep, your marketing applications, or your central master data hub can't read it aloud either. Hence, the importance of both translation and transliteration.
Another concept to think about... Does your customer service application recognize when your CSR enters in William as Bill or Will? There are many answers to this question that result in a YES. I'll bore you with this piece another time. Actually, if you'd like to know the answer, I'll facilitate a meeting with you and my esse, Alan. He's the man.
I could also bore you with the common challenges of the internationalization of data as it relates to one common organization, however, if you're reading this you're probably familiar.
Back to transliteration...AND the challenge... I haven't found an organization, a library, or a service that will transliterate every language; or even a large subset for that matter as it relates to NAMES. It's 2011 people!!! How has this not yet been addressed? Or, has it?
-tc
P.S. Did you know? There are still places in the world that an address may be, 3rd building from the second sign after (fill in landmark) with the green door...