Unicode

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Unicode is a standard for computers to make them able to show text in different languages. Unicode standards are promoted by the Unicode Consortium and based on ISO standard. Their goal is to replace current/previous character encoding standards with a single worldwide standard for all languages. There are almost 100,000 characters in the latest definition of Unicode.

Unicode can be used for texts in about every national language and most other languages including European, middle-east and Asiatic ones.

Unicode is a digital code used by computers, some communications equipment, and other devices to handle text.

Unicode was developed in the 1990s and integrated earlier codes used on computer systems.

Unicode provides printable characters such as letters of most alphabets in different cases, digits, diactrics, punctuation marks. It also provides characters which do not actually print but instead control how text is processed, for example line endings, or right to left.

Unicode allow to express English words with diacritics such as café, résumé, zoölogy, blessèd, Bön or piña colada.

Unicode consider a graphical character (for instance é) as code point (alone or in sequence [e+ ‘] ). Each code point is a multidigit number which can be encoded in one or several code units. Code units are 8 bits 16 bits or 32 bits. This allows to represent those many characters by sequences of two symbols (zero and ones).

Encodings[change | edit source]

There are different ways to encode Unicode, the most common ones are:

UTF8 is the most common of these for exchange. It is used for internet, electronic mail, Java also uses a variant of it.

Problems[change | edit source]

Specific Russian (top) and proper Serbian/Macedonian (bottom) letters. This is an example of the problem when Unicode support isn't enough, then OpenType and competitive technologies must pop in.

Other websites[change | edit source]