The anatomy of a URL
https://user:pass@example.com:8080/path/to?q=v&x=y#section └─┬─┘ └────┬───┘ └────┬────┘ └┬┘└──┬──┘ └───┬───┘ └──┬──┘ │ │ │ │ │ │ │ │ │ │ │ │ │ └─── fragment (client-side only, never sent to server) │ │ │ │ │ └─────────── query string (parameters) │ │ │ │ └───────────────────── path │ │ │ └───────────────────────── port (optional; defaults: 443 for https, 80 for http) │ │ └───────────────────────────────── hostname │ └───────────────────────────────────────────── userinfo (rarely used; security risk if logged) └─────────────────────────────────────────────────────── scheme / protocol
The fragment never reaches the server
The part after # is the fragment identifier. It's processed entirely by the browser and never included in HTTP requests. Originally intended for jumping to anchor sections in a page (#section-2), it's now widely used for client-side state: SPA routing, share links that encode form state, OAuth implicit-flow tokens.
Practical implication: anything in the fragment is invisible to server logs, CDN caches, and analytics. That's a feature (no PII leaks to your access logs) and a footgun (you can't see what URLs people actually visited beyond the path).
URL encoding rules you'll trip over
- Space →
%20in paths,+in query strings (form-urlencoded). Both work in most cases but they're technically different encodings:%20is RFC 3986;+is the form-encoding from HTML4.encodeURIComponentalways produces%20;URLSearchParamsproduces+. If you mix the two on a server endpoint, things break subtly. encodeURIvsencodeURIComponent.encodeURIpreserves reserved characters (/,?,&,=,#) — meant for whole URLs.encodeURIComponentencodes them — meant for individual parameter values. UsingencodeURIon a query value will leave&intact and break your parser.- The
+trap.+means a literal plus in paths and fragments, but means a space in form-encoded query strings.?email=a+b@example.comdecodes to "a b@example.com" on most servers. To send a literal plus in a query value, encode it as%2B. - Unicode in URLs. Non-ASCII characters in the path get percent-encoded as their UTF-8 bytes.
/résumébecomes/r%C3%A9sum%C3%A9. In the hostname, Unicode goes through Punycode:café.combecomesxn--caf-dma.com. The browser address bar shows the human form; the wire format uses Punycode for ASCII compatibility.
What the URL Standard says vs what browsers actually do
RFC 3986 is the formal URL specification. The WHATWG URL Standard (used by all browsers and Node.js since 10) is more lenient and reflects what implementations actually do. Where they disagree, follow WHATWG — RFC 3986 forbids some patterns (bare IPv6 addresses without brackets, trailing dots in hostnames, etc.) that browsers happily accept.
In JavaScript, use new URL(input) for parsing. It's the WHATWG-compliant parser and handles every edge case correctly. Don't write your own URL parser unless you're certain you need to — there are dozens of corner cases that trivial regex-based parsers get wrong.
Security-adjacent gotchas
- Open redirects. If your app builds URLs like
/login?return=USER_INPUTand redirects to whateverreturncontains, attackers can craft a phishing link that goes to your domain but redirects to theirs. Always validate that the return URL stays on your origin (or use only relative paths). - userinfo in URLs. Browsers warn about
https://user:pass@bank.comURLs because they let phishers craft URLs where the bank's domain looks like the user-portion of an attacker URL. Strip userinfo when displaying URLs to users. - Path traversal in path params. A URL with
/files/../../etc/passwdshould be normalized before use. Modern frameworks do this automatically; rawfs.readFile(path)on URL paths does not. - Punycode homograph attacks.
https://раypal.com(Cyrillic 'а') looks identical to PayPal but resolves to a different domain. Browsers mitigate this with mixed-script detection but the trap still exists for hand-pasted URLs.
Common use cases
- Inspect a complex URL to debug a routing issue
- Edit query parameters without manual encoding mistakes
- Confirm a redirect target is on your own origin (open-redirect audit)
- Encode/decode a single string for use in a URL
Frequently asked questions
Why does the fragment never reach the server?
The part after <code>#</code> is processed entirely by the browser. It's used for in-page anchors and SPA routing. Server logs, CDN caches, and analytics never see it — that's why OAuth implicit-flow tokens were sent in the fragment.
<code>encodeURI</code> vs <code>encodeURIComponent</code>?
<code>encodeURI</code> preserves reserved URL characters (/, ?, &, =, #) — meant for whole URLs. <code>encodeURIComponent</code> encodes them — meant for individual parameter values. Using the wrong one is a top-5 cause of broken URL handling.
Why does "+" sometimes mean space and sometimes a plus?
<code>+</code> means literal plus in paths and fragments, but means a space in form-encoded query strings (HTML4 legacy). To send a literal plus in a query value, encode it as <code>%2B</code>. Same applies when receiving — depending on framework, <code>?x=a+b</code> may decode to "a b" or "a+b".
What's the difference between RFC 3986 and the WHATWG URL Standard?
RFC 3986 (2005) is the formal spec. The WHATWG URL Standard is what browsers and Node.js actually implement — more permissive, handles edge cases real-world URLs need (bare IPv6, trailing dots, Unicode hostnames). Where they disagree, follow WHATWG. This tool uses the browser's built-in WHATWG-compliant <code>URL</code> parser.