# UNICODE

You've probably noticed that all of our programs have been doing input and output with only English letters (in programming we call them Latin characters). However, many programs you use in daily life allow you to do input and output with Thai letters. In this section we're going to talk about UNICODE, which is the way your computer thinks about Thai letters.

We'll have to cover a few topics on this page:

• ASCII - How the computer thinks about Latin characters.
• Hexidecimal - How the computer thinks about data and numbers.
• UNICODE - How the computer thinks about non-Latin characters.
• Finally, well talk about how to use UNICODE in a C program.

## Why should I learn this?

If you do decide to become a programmer, you will have many opportunities doing Localization, translating big programs so that Thai people can use them. Big companies like Microsoft, Apple, Google and Facebook all need people who can:

• speak English
• speak Thai fluently
• write programs

Many of these companies have a hard time finding people with all three of these skills. Not many foreigners can speak Thai well enough, so the companies need Thai people to help them. Every day more websites and programs are being translated into Thai, and there are tons of jobs available. However, in order to do these kinds of jobs, you'll need to know how the computer thinks about Thai characters.

## ASCII

To store Latin characters, the computer uses a system called ASCII (American Standard Code for Information Interchange). ASCII assigns every letter, digit, punctuation mark, etc. to a different number. For example:

• 'A' = 65, 'B' = 66, 'C' = 67 ...
• 'a' = 97, 'b' = 98, 'c' = 99 ...
• '0' = 48, '1' = 49, '2' = 50 ...
• '(' = 40, ')' = 41, '*' = 42, '+' = 43

You can find the complete list at asciitable.com. There are 256 (2^8) characters in the ASCII table, so a normal char only needs 8 bits to store a character. Let's look at how the computer thinks about the string "Hello World":

In fact, we could even use an array of Integers as a String, and the program would work normally!

``````int hello[] = {72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33, 0 };

printf(ā%sā, (char *)hello);
``````

## Hexidecimal Numbers

Normally when we count, we use the decimal number system, which is base-10. This means, there are 10 digits that we use to count: 0,1,2,3,4,5,6,7,8,9.

When our computer stores data--int, char, float or anything else--it uses binary, a base-2 number system. This means, there are only 2 digits: 1 and 0.

Because binary is base-2, we need to use more digits. This means that numbers quickly become very long. For example, the number 61 is 1111101. When we want to use less space, we use a number system called Hexidecimal, which is base-16. The digits are 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F.

In the following table, we'll look at how each number system thinks about different numbers

Hexidecimal numbers are very important. If you continue programming, you will see them very often.

## UNICODE

While ASCII has enough space for all of the Latin characters (and all the European accent characters), there isn't enough room for other languages. We can solve this problem with UNICODE, a huge table with room for 110,000 different characters. With UNICODE, we can write programs that input and output in languages like Chinese, Arabic, Sanskrit and Thai.

The UNICODE table is broken up into many pieces, here is the Thai UNICODE page. Each letter has a Hexidecimal number assigned to it. For example, ก is assigned 0E01, which would be 3585 in decimal numbers.

Let's look at our "Hello World!" example, but now in Thai using UNICODE numbers.

## Writing a UNICODE program

In order to use UNICODE in our program, we'll have to change the way we've been dealing with Strings.

We have to use a new type called wchar_t, the wide character type. These "wide" strings have to be written with an L before them. Each character is written as \uxxxx, where the x's are the UNICODE number.We have to use special "wide" input and output functions, like wprintf() and fwgets().

``````wchar_t hello[1000] = L"\u0E2A\u0E27\u0E31\u0E2A\u0E14\u0E35\u0E42\u0E25\u0E01";
wprintf(L"%s", hello);
``````

Depending on what computer you use, writing UNICODE programs might be easy or impossible. On Mac or Linux computers, UNICODE programs are easy to write. On Windows, your output will be wrong, but you can still do File I/O if you do it right.