Text to Binary Guide: Understanding Text Encoding

By Suvom Das March 27, 2026 19 min read

1. How Computers Store Text

At the most fundamental level, computers store and process everything as binary numbers -- sequences of 0s and 1s. This includes text, images, audio, video, and executable programs. When you type the letter "A" on your keyboard, the computer does not store a visual representation of the letter. Instead, it stores the number 65 (in decimal), which is 01000001 in binary.

This mapping between human-readable characters and numeric values is called a character encoding. Different encoding standards define different mappings. The most important ones in computing history are ASCII (the foundational encoding), Unicode (the universal character set), and UTF-8 (the dominant encoding on the modern web).

Understanding text encoding is essential for developers who work with file I/O, network protocols, databases, internationalization, data serialization, and security. Encoding mismatches cause some of the most frustrating bugs in software -- garbled text, "mojibake" (nonsensical character substitutions), and data corruption.

Bits and Bytes

A bit (binary digit) is the smallest unit of data: a single 0 or 1. A byte is a group of 8 bits, capable of representing 256 different values (2^8 = 256, ranging from 0 to 255). Bytes are the standard unit for measuring data storage and processing. When we say a character is "1 byte," we mean it takes 8 bits to represent it.

2. Number Base Systems Explained

A number base (or radix) determines how many distinct digit symbols are used and the positional value of each digit. Different bases are used in computing for different purposes:

Binary (Base 2)

Binary uses only two digits: 0 and 1. Each position represents a power of 2. Binary is the native language of digital electronics because transistors have two states: on (1) and off (0).

Decimal 65 in binary:
  64 + 0 + 0 + 0 + 0 + 0 + 0 + 1
= 2^6 + 2^0
= 01000001

Position values (8-bit): 128  64  32  16   8   4   2   1
Binary digits:             0   1   0   0   0   0   0   1

Octal (Base 8)

Octal uses digits 0-7. Each octal digit corresponds to exactly 3 binary digits, making it a convenient shorthand for binary. Octal is primarily used in Unix/Linux file permissions:

chmod 755 = binary 111 101 101
  Owner: 7 = 111 = read+write+execute
  Group: 5 = 101 = read+execute
  Other: 5 = 101 = read+execute

Decimal (Base 10)

Decimal uses digits 0-9 and is the standard number system for human communication. When we say a byte value is "65," we mean decimal 65. In the context of text encoding, decimal values are the most intuitive way to reference character codes.

Hexadecimal (Base 16)

Hexadecimal uses digits 0-9 and letters A-F (representing values 10-15). Each hex digit corresponds to exactly 4 binary digits, making it the most compact representation of binary data. A single byte (8 bits) is represented by exactly 2 hex digits:

Decimal 65 = Hex 41
  4 = 0100 (binary)
  1 = 0001 (binary)
  41 hex = 01000001 binary = 65 decimal

Common uses:
  Color codes: #FF5733 = Red:255 Green:87 Blue:51
  Memory addresses: 0x7FFF5FBFF8A0
  MAC addresses: AA:BB:CC:DD:EE:FF
  Byte sequences: 48 65 6C 6C 6F = "Hello"

3. ASCII Encoding Deep Dive

ASCII (American Standard Code for Information Interchange) was developed in the early 1960s and published as a standard in 1963. It uses 7 bits to define 128 characters, divided into:

ASCII Table: Key Ranges

Range      Characters           Examples
0-31       Control characters    \n (10), \t (9), \r (13)
32         Space                 ' '
48-57      Digits 0-9            '0'=48, '9'=57
65-90      Uppercase A-Z         'A'=65, 'Z'=90
97-122     Lowercase a-z         'a'=97, 'z'=122
33-47      Punctuation           '!'=33, '.'=46
58-64      More punctuation      ':'=58, '@'=64
91-96      Brackets etc.         '['=91, '`'=96
123-126    More symbols          '{'=123, '~'=126

Converting "Hello" to Multiple Formats

Character | Decimal | Binary     | Hex | Octal
----------|---------|------------|-----|------
H         | 72      | 01001000   | 48  | 110
e         | 101     | 01100101   | 65  | 145
l         | 108     | 01101100   | 6C  | 154
l         | 108     | 01101100   | 6C  | 154
o         | 111     | 01101111   | 6F  | 157

Useful ASCII Patterns

ASCII has several useful patterns that programmers exploit:

4. UTF-8: The Universal Encoding

While ASCII handles English text, it cannot represent characters from other languages -- Chinese, Arabic, Hindi, Japanese, Korean, or even accented European letters. Unicode was created to solve this problem by assigning a unique number (code point) to every character in every writing system.

Unicode defines over 149,000 characters across 161 modern and historic scripts. Code points are written as U+XXXX (e.g., U+0041 for 'A', U+4E16 for the Chinese character for "world"). Unicode itself is a character set -- it defines which number maps to which character. UTF-8 is an encoding that defines how those numbers are stored as bytes.

How UTF-8 Works

UTF-8 is a variable-length encoding that uses 1 to 4 bytes per character:

Code Point Range      | Bytes | Byte Pattern
U+0000 to U+007F      | 1     | 0xxxxxxx
U+0080 to U+07FF      | 2     | 110xxxxx 10xxxxxx
U+0800 to U+FFFF      | 3     | 1110xxxx 10xxxxxx 10xxxxxx
U+10000 to U+10FFFF   | 4     | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

The leading bits of the first byte indicate how many bytes the character uses. Continuation bytes always start with 10. This design makes UTF-8 self-synchronizing -- you can find character boundaries by looking at any byte in a stream.

UTF-8 Example: Euro Sign

Euro sign: U+20AC
Binary of 20AC: 0010 0000 1010 1100

3-byte pattern: 1110xxxx 10xxxxxx 10xxxxxx
Fill in bits:   11100010 10000010 10101100
Hex bytes:      E2       82       AC

So the euro sign is encoded as the bytes: E2 82 AC

Why UTF-8 Won

As of 2026, UTF-8 is used by over 98% of all websites, making it the de facto standard encoding for the web, email, and most modern software.

5. Text to Binary Conversion

Converting text to binary involves three steps for each character:

  1. Look up the character's numeric value in the encoding table (ASCII for basic characters, UTF-8 byte values for others)
  2. Convert each byte value to its binary representation
  3. Pad each binary value to 8 digits (one full byte) with leading zeros

Step-by-Step Example

Text: "Hi!"

Step 1: Character to decimal
  'H' = 72
  'i' = 105
  '!' = 33

Step 2: Decimal to binary
  72  = 1001000
  105 = 1101001
  33  = 100001

Step 3: Pad to 8 bits
  72  = 01001000
  105 = 01101001
  33  = 00100001

Result: 01001000 01101001 00100001

Manual Binary Conversion

To convert a decimal number to binary manually, repeatedly divide by 2 and record the remainders:

72 / 2 = 36 remainder 0
36 / 2 = 18 remainder 0
18 / 2 = 9  remainder 0
9  / 2 = 4  remainder 1
4  / 2 = 2  remainder 0
2  / 2 = 1  remainder 0
1  / 2 = 0  remainder 1

Read remainders bottom-to-top: 1001000
Pad to 8 bits: 01001000

6. Text to Hexadecimal Conversion

Hexadecimal is the most commonly used format for representing binary data in a readable form. Each hex digit represents exactly 4 binary bits, so one byte is always exactly 2 hex digits. This makes hex particularly useful for debugging, network analysis, and data inspection.

Conversion Process

Text: "Hello"
H = 72  = 48 hex
e = 101 = 65 hex
l = 108 = 6C hex
l = 108 = 6C hex
o = 111 = 6F hex

Hex output: 48 65 6C 6C 6F

Hex in Everyday Programming

7. Octal and Decimal Representations

Octal (Base 8)

Octal represents each byte as a 3-digit number (000-377). While less common than hex in modern development, octal remains important for Unix file permissions and some legacy systems:

Text: "Hi"
H = 72  = 110 octal
i = 105 = 151 octal

Octal output: 110 151

Unix permissions in octal:
  chmod 644 = rw-r--r--
  chmod 755 = rwxr-xr-x

Decimal (Base 10)

Decimal is the most human-readable format. Each byte is represented as a number from 0 to 255:

Text: "Hi"
H = 72
i = 105

Decimal output: 72 105

ASCII character codes are most commonly referenced in decimal:
  'A' = 65, 'a' = 97, '0' = 48, space = 32

8. Practical Applications

Text-to-binary conversion is not just an academic exercise. Here are real-world scenarios where understanding character encoding matters:

Network Protocol Analysis

Network packet analyzers like Wireshark display packet contents in hexadecimal with ASCII interpretation. Understanding hex-to-text conversion is essential for debugging HTTP requests, analyzing DNS queries, inspecting WebSocket frames, and examining TLS handshakes.

File Format Inspection

Every file format starts with "magic bytes" -- specific byte sequences that identify the file type. Hex editors reveal these signatures:

PDF:  25 50 44 46 = "%PDF"
PNG:  89 50 4E 47 = ".PNG"
ZIP:  50 4B 03 04 = "PK.."
JPEG: FF D8 FF E0 = "...."
GIF:  47 49 46 38 = "GIF8"

Data Serialization and Transmission

When transmitting binary data over text-based protocols (like HTTP, email, or JSON), it must be encoded into a text-safe format. Common methods include Base64 (which encodes binary as ASCII characters), hex encoding, and URL encoding. Understanding the underlying binary representation helps debug encoding issues.

Security and Cryptography

Hash functions (SHA-256, MD5) and encryption algorithms (AES, RSA) operate on byte arrays. Their inputs and outputs are typically displayed in hexadecimal. Understanding the relationship between text and its byte representation is essential for implementing correct hashing, encryption, and digital signature verification.

Steganography and Data Hiding

Binary representation of text is used in steganography -- the practice of hiding messages within other data. Text can be converted to binary and embedded in the least significant bits of image pixels, audio samples, or video frames, making the hidden data virtually invisible.

9. Text Encoding in Programming

JavaScript

// Character to code point
'A'.charCodeAt(0); // 65
'A'.codePointAt(0); // 65

// Code point to character
String.fromCharCode(65); // 'A'
String.fromCodePoint(128512); // Emoji

// Text to binary
function textToBinary(text) {
  return Array.from(text)
    .map(char => char.charCodeAt(0).toString(2).padStart(8, '0'))
    .join(' ');
}
textToBinary('Hi'); // "01001000 01101001"

// Text to hex
function textToHex(text) {
  return Array.from(text)
    .map(char => char.charCodeAt(0).toString(16).toUpperCase().padStart(2, '0'))
    .join(' ');
}
textToHex('Hi'); // "48 69"

// UTF-8 encoding with TextEncoder
const encoder = new TextEncoder();
const bytes = encoder.encode('Hello'); // Uint8Array [72, 101, 108, 108, 111]

// UTF-8 decoding with TextDecoder
const decoder = new TextDecoder();
decoder.decode(new Uint8Array([72, 101, 108, 108, 111])); // "Hello"

Python

# Character to code point
ord('A')  # 65

# Code point to character
chr(65)   # 'A'

# Text to binary
' '.join(format(ord(c), '08b') for c in 'Hi')
# '01001000 01101001'

# Text to hex
' '.join(format(ord(c), '02X') for c in 'Hi')
# '48 69'

# UTF-8 encoding
'Hello'.encode('utf-8')   # b'Hello'

# UTF-8 byte values
list('Hello'.encode('utf-8'))  # [72, 101, 108, 108, 111]

Java

// Character to integer
int code = 'A';  // 65

// Integer to binary string
Integer.toBinaryString(65);  // "1000001"

// String to bytes
byte[] bytes = "Hello".getBytes(StandardCharsets.UTF_8);

// Bytes to hex
StringBuilder hex = new StringBuilder();
for (byte b : bytes) {
    hex.append(String.format("%02X ", b));
}
// "48 65 6C 6C 6F"

10. Common Encoding Issues and Solutions

Mojibake (Garbled Text)

Mojibake occurs when text encoded in one format is decoded using a different format. For example, UTF-8 encoded text displayed as Latin-1 produces garbled characters. The fix is always to ensure the same encoding is used for both writing and reading:

// Wrong: server sends UTF-8, client reads as Latin-1
"cafe\xCC\x81" (UTF-8 for "cafe") -> "café" (read as Latin-1)

// Fix: always specify encoding explicitly
Content-Type: text/html; charset=utf-8
<meta charset="UTF-8">

BOM (Byte Order Mark)

Some editors add a BOM (EF BB BF in UTF-8) at the beginning of files. This invisible prefix can cause issues with shell scripts, JSON parsing, CSV imports, and HTTP responses. Many tools now strip BOMs automatically, but it remains a common source of subtle bugs.

Null Bytes

The null byte (0x00) terminates strings in C and many lower-level languages. Including null bytes in text data can cause premature string truncation, security vulnerabilities (null byte injection), and data corruption in systems that expect null-terminated strings.

Best Practices

11. Using Our Text to Binary Converter

Our free Text to Binary Converter makes it easy to convert between text and numeric representations:

  1. Choose direction: "Text to Binary/Hex/Oct/Dec" for encoding, or "Binary/Hex/Oct/Dec to Text" for decoding
  2. Select format: Binary, Hexadecimal, Octal, or Decimal
  3. Choose separator: Space, comma, none, or newline between values
  4. Enter input text (or encoded data for decoding) in the left textarea
  5. See results instantly in the right textarea -- conversion happens as you type
  6. Copy output with one click, or use Swap to reverse input and output

The tool supports ASCII and UTF-8 characters, including emoji and international text. All processing happens in your browser -- no data is sent to any server.

Frequently Asked Questions

How does text to binary conversion work?
Each character is converted to its ASCII or UTF-8 byte value, then each byte is represented as an 8-bit binary number. For example, 'A' = 65 decimal = 01000001 binary. Multi-byte UTF-8 characters produce multiple 8-bit groups.
What is the difference between ASCII and UTF-8?
ASCII uses 7 bits for 128 characters (English only). UTF-8 is variable-length (1-4 bytes), backward compatible with ASCII, and supports over 1.1 million characters including all world scripts and emoji.
Can I convert binary back to text?
Yes, select "Decode" from the direction dropdown. The tool converts binary, hex, octal, or decimal values back to readable text. It handles both ASCII and multi-byte UTF-8 sequences.
What output formats are supported?
Binary (base-2, 8-bit groups), Hexadecimal (base-16, 2-digit pairs), Octal (base-8, 3-digit groups), and Decimal (base-10 byte values). Output can be space, comma, none, or newline separated.
Does this tool support emoji?
Yes, all UTF-8 characters are supported including emoji, accented letters, and CJK characters. Multi-byte characters produce multiple byte values in the output.
Is my data secure?
Yes, all conversions happen in your browser using JavaScript. No data is sent to any server. Your text and output remain completely private on your device.

Try the Text to Binary Converter

Convert text to binary, hex, octal, and decimal instantly.

Open Text to Binary Converter →