At the most fundamental level, computers store and process everything as binary numbers -- sequences of 0s and 1s. This includes text, images, audio, video, and executable programs. When you type the letter "A" on your keyboard, the computer does not store a visual representation of the letter. Instead, it stores the number 65 (in decimal), which is 01000001 in binary.
This mapping between human-readable characters and numeric values is called a character encoding. Different encoding standards define different mappings. The most important ones in computing history are ASCII (the foundational encoding), Unicode (the universal character set), and UTF-8 (the dominant encoding on the modern web).
Understanding text encoding is essential for developers who work with file I/O, network protocols, databases, internationalization, data serialization, and security. Encoding mismatches cause some of the most frustrating bugs in software -- garbled text, "mojibake" (nonsensical character substitutions), and data corruption.
A bit (binary digit) is the smallest unit of data: a single 0 or 1. A byte is a group of 8 bits, capable of representing 256 different values (2^8 = 256, ranging from 0 to 255). Bytes are the standard unit for measuring data storage and processing. When we say a character is "1 byte," we mean it takes 8 bits to represent it.
A number base (or radix) determines how many distinct digit symbols are used and the positional value of each digit. Different bases are used in computing for different purposes:
Binary uses only two digits: 0 and 1. Each position represents a power of 2. Binary is the native language of digital electronics because transistors have two states: on (1) and off (0).
Decimal 65 in binary:
64 + 0 + 0 + 0 + 0 + 0 + 0 + 1
= 2^6 + 2^0
= 01000001
Position values (8-bit): 128 64 32 16 8 4 2 1
Binary digits: 0 1 0 0 0 0 0 1
Octal uses digits 0-7. Each octal digit corresponds to exactly 3 binary digits, making it a convenient shorthand for binary. Octal is primarily used in Unix/Linux file permissions:
chmod 755 = binary 111 101 101
Owner: 7 = 111 = read+write+execute
Group: 5 = 101 = read+execute
Other: 5 = 101 = read+execute
Decimal uses digits 0-9 and is the standard number system for human communication. When we say a byte value is "65," we mean decimal 65. In the context of text encoding, decimal values are the most intuitive way to reference character codes.
Hexadecimal uses digits 0-9 and letters A-F (representing values 10-15). Each hex digit corresponds to exactly 4 binary digits, making it the most compact representation of binary data. A single byte (8 bits) is represented by exactly 2 hex digits:
Decimal 65 = Hex 41
4 = 0100 (binary)
1 = 0001 (binary)
41 hex = 01000001 binary = 65 decimal
Common uses:
Color codes: #FF5733 = Red:255 Green:87 Blue:51
Memory addresses: 0x7FFF5FBFF8A0
MAC addresses: AA:BB:CC:DD:EE:FF
Byte sequences: 48 65 6C 6C 6F = "Hello"
ASCII (American Standard Code for Information Interchange) was developed in the early 1960s and published as a standard in 1963. It uses 7 bits to define 128 characters, divided into:
Range Characters Examples
0-31 Control characters \n (10), \t (9), \r (13)
32 Space ' '
48-57 Digits 0-9 '0'=48, '9'=57
65-90 Uppercase A-Z 'A'=65, 'Z'=90
97-122 Lowercase a-z 'a'=97, 'z'=122
33-47 Punctuation '!'=33, '.'=46
58-64 More punctuation ':'=58, '@'=64
91-96 Brackets etc. '['=91, '`'=96
123-126 More symbols '{'=123, '~'=126
Character | Decimal | Binary | Hex | Octal
----------|---------|------------|-----|------
H | 72 | 01001000 | 48 | 110
e | 101 | 01100101 | 65 | 145
l | 108 | 01101100 | 6C | 154
l | 108 | 01101100 | 6C | 154
o | 111 | 01101111 | 6F | 157
ASCII has several useful patterns that programmers exploit:
While ASCII handles English text, it cannot represent characters from other languages -- Chinese, Arabic, Hindi, Japanese, Korean, or even accented European letters. Unicode was created to solve this problem by assigning a unique number (code point) to every character in every writing system.
Unicode defines over 149,000 characters across 161 modern and historic scripts. Code points are written as U+XXXX (e.g., U+0041 for 'A', U+4E16 for the Chinese character for "world"). Unicode itself is a character set -- it defines which number maps to which character. UTF-8 is an encoding that defines how those numbers are stored as bytes.
UTF-8 is a variable-length encoding that uses 1 to 4 bytes per character:
Code Point Range | Bytes | Byte Pattern
U+0000 to U+007F | 1 | 0xxxxxxx
U+0080 to U+07FF | 2 | 110xxxxx 10xxxxxx
U+0800 to U+FFFF | 3 | 1110xxxx 10xxxxxx 10xxxxxx
U+10000 to U+10FFFF | 4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
The leading bits of the first byte indicate how many bytes the character uses. Continuation bytes always start with 10. This design makes UTF-8 self-synchronizing -- you can find character boundaries by looking at any byte in a stream.
Euro sign: U+20AC
Binary of 20AC: 0010 0000 1010 1100
3-byte pattern: 1110xxxx 10xxxxxx 10xxxxxx
Fill in bits: 11100010 10000010 10101100
Hex bytes: E2 82 AC
So the euro sign is encoded as the bytes: E2 82 AC
As of 2026, UTF-8 is used by over 98% of all websites, making it the de facto standard encoding for the web, email, and most modern software.
Converting text to binary involves three steps for each character:
Text: "Hi!"
Step 1: Character to decimal
'H' = 72
'i' = 105
'!' = 33
Step 2: Decimal to binary
72 = 1001000
105 = 1101001
33 = 100001
Step 3: Pad to 8 bits
72 = 01001000
105 = 01101001
33 = 00100001
Result: 01001000 01101001 00100001
To convert a decimal number to binary manually, repeatedly divide by 2 and record the remainders:
72 / 2 = 36 remainder 0
36 / 2 = 18 remainder 0
18 / 2 = 9 remainder 0
9 / 2 = 4 remainder 1
4 / 2 = 2 remainder 0
2 / 2 = 1 remainder 0
1 / 2 = 0 remainder 1
Read remainders bottom-to-top: 1001000
Pad to 8 bits: 01001000
Hexadecimal is the most commonly used format for representing binary data in a readable form. Each hex digit represents exactly 4 binary bits, so one byte is always exactly 2 hex digits. This makes hex particularly useful for debugging, network analysis, and data inspection.
Text: "Hello"
H = 72 = 48 hex
e = 101 = 65 hex
l = 108 = 6C hex
l = 108 = 6C hex
o = 111 = 6F hex
Hex output: 48 65 6C 6C 6F
#326CE5 = RGB(50, 108, 229)\u0041 = 'A'0x7FFF5FBFF8A0: 48 65 6C 6C 6Fsha256: a591a6d40bf420404a011733cfb7b190...0xFF, \xABOctal represents each byte as a 3-digit number (000-377). While less common than hex in modern development, octal remains important for Unix file permissions and some legacy systems:
Text: "Hi"
H = 72 = 110 octal
i = 105 = 151 octal
Octal output: 110 151
Unix permissions in octal:
chmod 644 = rw-r--r--
chmod 755 = rwxr-xr-x
Decimal is the most human-readable format. Each byte is represented as a number from 0 to 255:
Text: "Hi"
H = 72
i = 105
Decimal output: 72 105
ASCII character codes are most commonly referenced in decimal:
'A' = 65, 'a' = 97, '0' = 48, space = 32
Text-to-binary conversion is not just an academic exercise. Here are real-world scenarios where understanding character encoding matters:
Network packet analyzers like Wireshark display packet contents in hexadecimal with ASCII interpretation. Understanding hex-to-text conversion is essential for debugging HTTP requests, analyzing DNS queries, inspecting WebSocket frames, and examining TLS handshakes.
Every file format starts with "magic bytes" -- specific byte sequences that identify the file type. Hex editors reveal these signatures:
PDF: 25 50 44 46 = "%PDF"
PNG: 89 50 4E 47 = ".PNG"
ZIP: 50 4B 03 04 = "PK.."
JPEG: FF D8 FF E0 = "...."
GIF: 47 49 46 38 = "GIF8"
When transmitting binary data over text-based protocols (like HTTP, email, or JSON), it must be encoded into a text-safe format. Common methods include Base64 (which encodes binary as ASCII characters), hex encoding, and URL encoding. Understanding the underlying binary representation helps debug encoding issues.
Hash functions (SHA-256, MD5) and encryption algorithms (AES, RSA) operate on byte arrays. Their inputs and outputs are typically displayed in hexadecimal. Understanding the relationship between text and its byte representation is essential for implementing correct hashing, encryption, and digital signature verification.
Binary representation of text is used in steganography -- the practice of hiding messages within other data. Text can be converted to binary and embedded in the least significant bits of image pixels, audio samples, or video frames, making the hidden data virtually invisible.
// Character to code point
'A'.charCodeAt(0); // 65
'A'.codePointAt(0); // 65
// Code point to character
String.fromCharCode(65); // 'A'
String.fromCodePoint(128512); // Emoji
// Text to binary
function textToBinary(text) {
return Array.from(text)
.map(char => char.charCodeAt(0).toString(2).padStart(8, '0'))
.join(' ');
}
textToBinary('Hi'); // "01001000 01101001"
// Text to hex
function textToHex(text) {
return Array.from(text)
.map(char => char.charCodeAt(0).toString(16).toUpperCase().padStart(2, '0'))
.join(' ');
}
textToHex('Hi'); // "48 69"
// UTF-8 encoding with TextEncoder
const encoder = new TextEncoder();
const bytes = encoder.encode('Hello'); // Uint8Array [72, 101, 108, 108, 111]
// UTF-8 decoding with TextDecoder
const decoder = new TextDecoder();
decoder.decode(new Uint8Array([72, 101, 108, 108, 111])); // "Hello"
# Character to code point
ord('A') # 65
# Code point to character
chr(65) # 'A'
# Text to binary
' '.join(format(ord(c), '08b') for c in 'Hi')
# '01001000 01101001'
# Text to hex
' '.join(format(ord(c), '02X') for c in 'Hi')
# '48 69'
# UTF-8 encoding
'Hello'.encode('utf-8') # b'Hello'
# UTF-8 byte values
list('Hello'.encode('utf-8')) # [72, 101, 108, 108, 111]
// Character to integer
int code = 'A'; // 65
// Integer to binary string
Integer.toBinaryString(65); // "1000001"
// String to bytes
byte[] bytes = "Hello".getBytes(StandardCharsets.UTF_8);
// Bytes to hex
StringBuilder hex = new StringBuilder();
for (byte b : bytes) {
hex.append(String.format("%02X ", b));
}
// "48 65 6C 6C 6F"
Mojibake occurs when text encoded in one format is decoded using a different format. For example, UTF-8 encoded text displayed as Latin-1 produces garbled characters. The fix is always to ensure the same encoding is used for both writing and reading:
// Wrong: server sends UTF-8, client reads as Latin-1
"cafe\xCC\x81" (UTF-8 for "cafe") -> "café" (read as Latin-1)
// Fix: always specify encoding explicitly
Content-Type: text/html; charset=utf-8
<meta charset="UTF-8">
Some editors add a BOM (EF BB BF in UTF-8) at the beginning of files. This invisible prefix can cause issues with shell scripts, JSON parsing, CSV imports, and HTTP responses. Many tools now strip BOMs automatically, but it remains a common source of subtle bugs.
The null byte (0x00) terminates strings in C and many lower-level languages. Including null bytes in text data can cause premature string truncation, security vulnerabilities (null byte injection), and data corruption in systems that expect null-terminated strings.
Our free Text to Binary Converter makes it easy to convert between text and numeric representations:
The tool supports ASCII and UTF-8 characters, including emoji and international text. All processing happens in your browser -- no data is sent to any server.
Convert text to binary, hex, octal, and decimal instantly.
Open Text to Binary Converter →