HTML Entities Encoder / Decoder

Encode and decode HTML entities (named, numeric, or hex). Supports the full HTML5 entity set including ©, ®, ™, em/en dashes, smart quotes, math symbols, and arrows. 100% browser-side.

Encoding
Ad
Plain text
HTML-encoded

What HTML entities are

HTML entities are escape sequences that let you include characters in HTML which would otherwise be interpreted as markup or are difficult to type. The less-than sign < would start a tag if written literally, so HTML provides the entity &lt; to represent it as text. The ampersand & starts an entity, so it itself needs encoding as &amp; when you want a literal ampersand. The same is true for > (&gt;), " (&quot;), and ' (&#39; — there is no standard named entity for the apostrophe in HTML4, only in HTML5 as &apos;).

Beyond the five "unsafe" characters, entities exist for hundreds of typographic and mathematical symbols — copyright (&copy; for ©), em-dash (&mdash; for —), non-breaking space (&nbsp;), and so on. Modern UTF-8 documents rarely need these because the characters themselves work fine; entities mostly matter when you're producing HTML that might end up in environments with limited charset support, or when you need to represent characters that would otherwise be parsed as markup.

Three encoding styles

Every entity can be written three ways. Named entities like &amp; are the most readable but require the parser to recognize the name. Numeric entities like &#38; use the decimal Unicode code point. Hex entities like &#x26; use the hex code point — typically what you'll see when content is auto-encoded by an editor or generated by software. All three produce identical output when rendered.

Use named when possible for HTML you write by hand — it's the most maintainable. Use numeric or hex for programmatic encoding where the entity name might not exist (most non-Latin Unicode characters don't have named entities), or when targeting older parsers that may not recognize newer named entities like &hellip;.

When you actually need encoding

  • Producing HTML server-side — any user-supplied content rendered into a template MUST be HTML-encoded to prevent cross-site scripting. Modern frameworks (React, Vue, Angular, Jinja, Twig) do this by default. Plain template strings or string concatenation in legacy code do not.
  • Email HTML — accented characters often break in Outlook 2007–2016. Encoding them as numeric entities (&#233; for é) is more reliable than relying on charset declaration.
  • RSS / Atom feeds — XML requires entity encoding for &, <, > at minimum.
  • Inline SVG — text content inside <text> elements should be encoded if it contains XML-special characters.
  • Markdown source that contains literal HTML — to display a tag rather than render it, you encode it.

Common mistakes

  • Double-encoding. If a string is already encoded (&amp;lt;) and you encode it again, you get &amp;amp;lt;. This is the most common bug in template systems — encoding happens twice because both the framework and the application code encoded. Always know whether the value coming into your template is raw or pre-encoded.
  • Encoding the wrong characters. HTML-encoding ' and " matters inside attribute values, not in text content. Some libraries over-encode and produce HTML that's correct but harder to read.
  • Confusing HTML encoding with URL encoding. &amp; is for HTML; %26 is for URLs. They're not interchangeable. A URL inside an HTML attribute often needs both.
  • Trusting decoded output. The decode operation is safe to display — it reverses the encoding — but if the original input contained <script>, the decoded output contains <script>. Don't decode user input and inject it into the DOM without re-encoding.

FAQ

See the FAQ section below for answers about the difference between &apos; and &#39;, why &nbsp; sometimes "doesn't work", whether to encode high-Unicode characters, and how to handle entity-encoded data coming from an external API.

Common use cases

Frequently asked questions

What's the difference between <code>&amp;apos;</code> and <code>&amp;#39;</code>?

Both decode to the apostrophe character <code>'</code>. The named entity <code>&amp;apos;</code> is HTML5-only — it doesn't exist in HTML4 — so for maximum compatibility most encoders emit <code>&amp;#39;</code>. The HTML5 parser accepts both.

Why does <code>&amp;nbsp;</code> sometimes appear to "not work"?

Most often it's the opposite — it works too well. Several non-breaking spaces in a row prevent line breaking and create awkward white gaps. For accessibility, use <code>white-space: nowrap</code> on a containing element instead of stuffing <code>&amp;nbsp;</code> between words.

Should I encode all non-ASCII characters?

For modern UTF-8 HTML, no. The characters render fine literally. The exception is email HTML, where some legacy clients (older Outlook versions, some corporate gateways) misinterpret non-ASCII bytes if the MIME charset declaration is ambiguous. The "encode all non-ASCII" option exists for that case.

Does it handle astral characters (emoji, etc.)?

Yes. The encoder iterates by Unicode code point rather than UTF-16 code unit, so emoji (which are stored as surrogate pairs in JavaScript strings) get a single numeric entity each, not two.

Will the decoder handle malformed input?

It leaves unrecognized sequences as literal text. <code>&amp;notarealentity;</code> stays as is, so you can spot the typo. Numeric entities with invalid code points (negative, too large) decode to the Unicode replacement character.

Related tools