URL Parser & Builder

Break a URL into its components (protocol, host, port, path, query, fragment), edit them, rebuild the URL. Side panel for percent-encoding / decoding arbitrary strings. Uses the WHATWG URL Standard parser — the same one browsers and Node.js use.

Developer Tools
ProDentim Sponsored

The anatomy of a URL

https://user:pass@example.com:8080/path/to?q=v&x=y#section
└─┬─┘   └────┬───┘ └────┬────┘ └┬┘└──┬──┘ └───┬───┘ └──┬──┘
  │         │           │       │   │         │        │
  │         │           │       │   │         │        └─── fragment (client-side only, never sent to server)
  │         │           │       │   │         └─────────── query string (parameters)
  │         │           │       │   └───────────────────── path
  │         │           │       └───────────────────────── port (optional; defaults: 443 for https, 80 for http)
  │         │           └───────────────────────────────── hostname
  │         └───────────────────────────────────────────── userinfo (rarely used; security risk if logged)
  └─────────────────────────────────────────────────────── scheme / protocol

The fragment never reaches the server

The part after # is the fragment identifier. It's processed entirely by the browser and never included in HTTP requests. Originally intended for jumping to anchor sections in a page (#section-2), it's now widely used for client-side state: SPA routing, share links that encode form state, OAuth implicit-flow tokens.

Practical implication: anything in the fragment is invisible to server logs, CDN caches, and analytics. That's a feature (no PII leaks to your access logs) and a footgun (you can't see what URLs people actually visited beyond the path).

URL encoding rules you'll trip over

  • Space → %20 in paths, + in query strings (form-urlencoded). Both work in most cases but they're technically different encodings: %20 is RFC 3986; + is the form-encoding from HTML4. encodeURIComponent always produces %20; URLSearchParams produces +. If you mix the two on a server endpoint, things break subtly.
  • encodeURI vs encodeURIComponent. encodeURI preserves reserved characters (/, ?, &, =, #) — meant for whole URLs. encodeURIComponent encodes them — meant for individual parameter values. Using encodeURI on a query value will leave & intact and break your parser.
  • The + trap. + means a literal plus in paths and fragments, but means a space in form-encoded query strings. ?email=a+b@example.com decodes to "a b@example.com" on most servers. To send a literal plus in a query value, encode it as %2B.
  • Unicode in URLs. Non-ASCII characters in the path get percent-encoded as their UTF-8 bytes. /résumé becomes /r%C3%A9sum%C3%A9. In the hostname, Unicode goes through Punycode: café.com becomes xn--caf-dma.com. The browser address bar shows the human form; the wire format uses Punycode for ASCII compatibility.

What the URL Standard says vs what browsers actually do

RFC 3986 is the formal URL specification. The WHATWG URL Standard (used by all browsers and Node.js since 10) is more lenient and reflects what implementations actually do. Where they disagree, follow WHATWG — RFC 3986 forbids some patterns (bare IPv6 addresses without brackets, trailing dots in hostnames, etc.) that browsers happily accept.

In JavaScript, use new URL(input) for parsing. It's the WHATWG-compliant parser and handles every edge case correctly. Don't write your own URL parser unless you're certain you need to — there are dozens of corner cases that trivial regex-based parsers get wrong.

Security-adjacent gotchas

  • Open redirects. If your app builds URLs like /login?return=USER_INPUT and redirects to whatever return contains, attackers can craft a phishing link that goes to your domain but redirects to theirs. Always validate that the return URL stays on your origin (or use only relative paths).
  • userinfo in URLs. Browsers warn about https://user:pass@bank.com URLs because they let phishers craft URLs where the bank's domain looks like the user-portion of an attacker URL. Strip userinfo when displaying URLs to users.
  • Path traversal in path params. A URL with /files/../../etc/passwd should be normalized before use. Modern frameworks do this automatically; raw fs.readFile(path) on URL paths does not.
  • Punycode homograph attacks. https://раypal.com (Cyrillic 'а') looks identical to PayPal but resolves to a different domain. Browsers mitigate this with mixed-script detection but the trap still exists for hand-pasted URLs.
ProDentim Sponsored

Common use cases

Frequently asked questions

Why does the fragment never reach the server?

The part after <code>#</code> is processed entirely by the browser. It's used for in-page anchors and SPA routing. Server logs, CDN caches, and analytics never see it — that's why OAuth implicit-flow tokens were sent in the fragment.

<code>encodeURI</code> vs <code>encodeURIComponent</code>?

<code>encodeURI</code> preserves reserved URL characters (/, ?, &, =, #) — meant for whole URLs. <code>encodeURIComponent</code> encodes them — meant for individual parameter values. Using the wrong one is a top-5 cause of broken URL handling.

Why does "+" sometimes mean space and sometimes a plus?

<code>+</code> means literal plus in paths and fragments, but means a space in form-encoded query strings (HTML4 legacy). To send a literal plus in a query value, encode it as <code>%2B</code>. Same applies when receiving — depending on framework, <code>?x=a+b</code> may decode to "a b" or "a+b".

What's the difference between RFC 3986 and the WHATWG URL Standard?

RFC 3986 (2005) is the formal spec. The WHATWG URL Standard is what browsers and Node.js actually implement — more permissive, handles edge cases real-world URLs need (bare IPv6, trailing dots, Unicode hostnames). Where they disagree, follow WHATWG. This tool uses the browser's built-in WHATWG-compliant <code>URL</code> parser.

Related tools