Each character in the ASCII table can be represented in 7 bits. For example, the null terminator is represented as
The leftmost 8th bit is reserved for Unicode, which will label it as a
0 if the character is within the ASCII table
ASCII reserves the first 32 characters (0-31) as control characters, which are characters not intended to represent printable information.
Control characters refer to things a computer must process, such as an alert, a tab, or a null terminator. All of the control characters begin with
000. For instance, a horizontal tab is represented as
Printing characters by their hex index
# Enable interpretation of backslash escapes, disable implicit trailing newline echo -en 'Hexademical number 0x41 is the letter \x41\n' # => Hexademical number 0x41 is the letter A
ANSI Color Codes
ANSI Color Codes are used to format output on terminals, as well as in many coding languages, such as
java. An ANSI color code can be expressed by typing the escape character
0x1b followed by
[ then the code number (e.g.
42) and then the letter
m. When you put that all together, it looks like
\x1b[42m.The formatting will persist until the reset code is given, which is code #0. You can specify this with
- It’s worth noting special attention to the escape character, which is decimal number
27or hex value
0x1bbecause it’s often typed. For instance, try entering this command in your terminal. It will print a green background.
echo -en '\x1b[42m Green \x1b[0m\n'
Here are some of the most useful codes:
||reset all fonts, formats, colors, etc.|
||enable bold font|
||enable faded font|
||enable italic font|
||enable underlined font|
||enable blinking font|
||disable bold font|
||disable italic font|
||disable underlined font|
||disable blinking font|
||select 256-color foreground|
||select 256-color background|
Unicode Transformation Format (UTF) is one of the mapping methods engineered to encode text. It does this by mapping code points to code values. Each code value is a unique sequence of bytes.
The UTF-16 encoding system, is not as simple as it’s name suggests. Each char is not encoded with 16 bits, as is commonly assumed. UTF-16 is a variable-width encoding format.
char object is encoded using UTF-16 and so are Windows filenames, as well as the C++ RESTful API SDK written my Microsoft.
It’s rarely advantageous to use UTF-16 over UTF-8. The only time it will result in a smaller file size is if the majority of text in the file consists of Chinese or Japanese characters. Even so, if there is a large amount of whitespace (which is an ASCII character) then the UTF-8 encoding would still result in a smaller file size.
UTF-32, unlike its brothers, is a fixed-width encoding format. Every character is guaranteed to be represented by exactly 4 bytes. UTF-32 is rarely used. Requiring every character to be represented with 4 bytes results in a significant increase in file size. It is slightly faster to read than UTF-8 but the difference is barely measurable.
Lastly, UTF-32 is problematic because it results in encoding many 8-bit strings of
0’s. Traditional software interprets this as the null terminator, which signals the end of the string, which would truncate the remaining information previously encoded by UTF-32.
Multi-byte encodings are non-ASCII. These use 2 bytes to encode a character set of up to 216 = 64,536 unique values.
Let’s say you want to make a text document look less…plain. Multi-byte encodings can help you out. If you’re using
vim to write your text (as all programmers should ), you can use Vim’s digraphs to help you out. If you’re not sure how to Vim, head over to this page.
Vim uses digraphs to encode non-ASCII characters with simple two key combos.
For example, let’s say you want to add a check mark for a to-do list:
Cameron's To Do List: - Hack Austin Traver's computer (Done) - Buy a MacBook (IMPORTANT!)
If you type
<C-k>OK in vim, you’ll get a check mark: ✓
Cameron's To Do List: ✓ Hack Austin Traver's computer ★★ Buy a MacBook
The command syntax uses:
You can see all multibyte characters if you type
:digraph. Here are some useful digraphs:
Some Greek Letters
Some Math Symbols
A Few Fractions
Now you can write such atrocities as:
1. ⌊π⌋= 3 ∴ π ≡ 3 ∞ 2. ∑ n = -(⅙ ×½)