What is quoted-printable encoding?
Quoted-Printable (QP) is a content transfer encoding defined in RFC 2045 Section 6.7, designed to safely carry mostly-ASCII text — particularly email bodies and MIME parts — through systems that historically only handled 7-bit data. Where Base64 turns every byte into a 4-of-3 expansion that's unreadable to humans, quoted-printable keeps printable ASCII characters literal and only escapes the bytes that would cause trouble. The result is mostly-readable text with occasional =XX hex escapes for non-ASCII or control characters.
The encoding has three rules: any byte outside printable ASCII becomes = followed by its two-digit uppercase hex value (so the byte 0xA9 becomes =A9); the literal = character itself is encoded as =3D to avoid ambiguity; and lines are kept to 76 characters or less, with a long line broken using a "soft line break" — an = at the end of the line followed by CRLF, which the decoder consumes silently. Trailing whitespace on a hard-broken line must also be encoded, because some mail transport agents historically stripped trailing whitespace and corrupted the message.
Why charset matters
Quoted-printable is byte-oriented — it encodes bytes, not characters. The bytes themselves are produced by a separate step: converting your text from its character form into a sequence of bytes according to a chosen character encoding (charset). The same word "café" produces four bytes in UTF-8 (63 61 66 C3 A9) but only four bytes in ISO-8859-1 (63 61 66 E9) because Latin-1 represents é as a single byte. The QP encoder will dutifully emit caf=C3=A9 in the first case and caf=E9 in the second. Both are technically valid quoted-printable. They decode to the same text only if the decoder uses the same charset that the encoder used.
This is why an email header typically declares Content-Type: text/plain; charset=UTF-8 alongside Content-Transfer-Encoding: quoted-printable — the two together tell the receiver how to reverse both steps. When the charset is missing or wrong, you get mojibake: characters that look like café, caf?, or other nonsense. For new messages, always use UTF-8 unless you have a hard requirement otherwise. For decoding old archives, you may need to try Windows-1252 (most common in legacy Western European email) or Shift_JIS / GB2312 / Big5 for East Asian archives that predate UTF-8 adoption.
Practical examples
Plain text: Réservation confirmée UTF-8 QP: R=C3=A9servation confirm=C3=A9e Latin-1 QP: R=E9servation confirm=E9e
The 76-character cap forces a soft break that the decoder removes silently. = Notice the = at the end of the previous line and how this line continues it.
Plain: "hello \n" (note trailing space before the newline) QP: hello=20 The trailing space is encoded as =20 so an MTA that strips trailing whitespace cannot silently corrupt the line.
Common problems and how to fix them
- Output looks like garbage (mojibake). The decoder is using the wrong charset. Try Windows-1252 first if the original was old Western European email, UTF-8 if it's recent, Shift_JIS / Big5 / GBK / EUC-KR for East Asian sources. The "Decode UTF-8 → mojibake" problem is almost always actually Windows-1252 in disguise.
- Stray
=at end of lines after decoding. The decoder treats=CRLFand=LFas soft line breaks (they're consumed). If you still see=at end of decoded lines, the input was malformed — somebody hand-edited it or it was corrupted in transit. - Lines are way over 76 characters. Most decoders tolerate this. Some strict MTAs do not. If you need RFC-strict output, run the encoder fresh on plain text rather than re-using an existing QP block.
- Decoded text contains weird characters at line boundaries. Look for unencoded trailing whitespace in the QP input — it was probably stripped by an MTA. There's no perfect recovery; re-request the message from the sender.
- Subject lines with funny
=?utf-8?Q?…?=wrapping. Those use "encoded-word" (RFC 2047), which is a slight variant: spaces are encoded as_instead of=20, and the wrapping is=?charset?Q?…?=. This tool encodes body parts, not RFC 2047 encoded-words — a separate utility.
UTF-8 vs legacy charsets — when to use which
For anything you control end-to-end, use UTF-8. It represents every Unicode character, it's the default for modern email clients and APIs, and it produces consistent results across systems. The only downside is that non-ASCII characters take 2–4 bytes each, so QP-encoded UTF-8 looks slightly more verbose than legacy-encoded text. That's a fair price for unambiguous decoding.
Use a legacy charset when (a) you're decoding archived email or web pages from before ~2010, (b) you're talking to a system explicitly requesting it (a few Japanese, Chinese, and Korean systems still ship with non-UTF-8 defaults), or (c) you're writing test cases for legacy compatibility. Don't use legacy charsets for new outgoing messages — there's no upside and a long list of interoperability problems.
Common use cases
- Decode email bodies copied from raw .eml files
- Encode a message body manually before sending via SMTP for testing
- Audit MIME content-transfer-encoding in your application
- Recover text from archived email written before UTF-8 was standard
- Convert between charsets while keeping the QP wrapper intact
Frequently asked questions
What is quoted-printable?
A content-transfer encoding from RFC 2045 used to safely move mostly-ASCII text through systems that historically only handled 7-bit data. Printable ASCII bytes pass through literally; the rest become <code>=XX</code> hex escapes. Common in email MIME parts.
When should I use QP vs Base64?
QP if the text is mostly ASCII — it stays human-readable and only slightly larger. Base64 if the content is binary (images, PDFs) or mostly non-ASCII (Japanese text), where QP would expand more than 3× and read like noise either way.
Why does my decoded text look like garbage?
Wrong charset. If the source email was Western European and from before ~2010, try Windows-1252 — the most common reason for "UTF-8 produces mojibake". For East Asian sources try Shift_JIS, Big5, or GBK. The encoding format is byte-correct in all cases; only the byte-to-character mapping changes.
What is a soft line break?
A literal <code>=</code> at the end of a 76-character line followed by CRLF. The decoder removes both — they're a hint to the receiver that the next line continues this one. Hard line breaks (real newlines in the original text) appear as bare CRLF without the leading <code>=</code>.
Does this tool support RFC 2047 encoded-words (the <code>=?utf-8?Q?…?=</code> syntax in headers)?
Not directly. RFC 2047 encoded-words are a variant of QP used inside email headers (Subject, From, To). The main differences: spaces are encoded as <code>_</code> instead of <code>=20</code>, the whole thing is wrapped in <code>=?charset?Q?…?=</code>, and CRLF line breaks aren't allowed. This tool encodes message bodies. We may add a dedicated encoded-word tool — let us know if you need it.
Is the input stored anywhere?
No. The text is sent to our server only to perform the encoding (because non-UTF-8 charsets require Node's iconv, which doesn't run in browsers reliably). It's held in memory for the duration of the request and discarded the moment the response is sent. Nothing is logged or persisted.