- HTML URL Encoding
- What is URL Encoding
- Reserved Characters
- Unreserved Characters
- URL Encoding Converter
- HTML — URL Encoding
- Example
- ASCII Control Characters Encoding
- Non-ASCII control characters encoding
- Reserved Characters Encoding
- Unsafe Characters Encoding
- HTML Uniform Resource Locators
- URL — Uniform Resource Locator
- Common URL Schemes
- URL Encoding
- Try It Yourself
- ASCII Encoding Examples
HTML URL Encoding
In this tutorial you will learn how to encode URL to safely transmit data over the internet.
What is URL Encoding
According to RFC 3986, the characters in a URL only limited to a defined set of reserved and unreserved US-ASCII characters. Any other characters are not allowed in a URL. But URL often contains characters outside the US-ASCII character set, so they must be converted to a valid US-ASCII format for worldwide interoperability. URL-encoding, also known as percent-encoding is a process of encoding URL information so that it can be safely transmitted over the internet.
To map the wide range of characters that is used worldwide, a two-step process is used:
- At first the data is encoded according to the UTF-8 character encoding.
- Then only those bytes that do not correspond to characters in the unreserved set should be percent-encoded like %HH, where HH is the hexadecimal value of the byte.
For example, the string: François would be encoded as: Fran%C3%A7ois
Ç, ç (c-cedilla) is a Latin script letter.
Reserved Characters
Certain characters are reserved or restricted from use in a URL because they may (or may not) be defined as delimiters by the generic syntax in a particular URL scheme. For example, forward slash / characters are used to separate different parts of a URL.
If data for a URL component contains character that would conflict with a reserved set of characters, which is defined as a delimiter in the URL scheme then the conflicting character must be percent-encoded before the URL is formed. Reserved characters in a URL are:
Unreserved Characters
Characters that are allowed in a URL but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde. The following table lists all the unreserved characters in a URL:
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | — | _ | . | ~ |
URL Encoding Converter
The following converter encodes and decodes the characters according to RFC 3986.
Enter some character and click on encode or decode button to see the output.
HTML — URL Encoding
URL encoding is the practice of translating unprintable characters or characters with special meaning within URLs to a representation that is unambiguous and universally accepted by web browsers and servers. These characters include −
- ASCII control characters − Unprintable characters typically used for output control. Character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal). A complete encoding table is given below.
- Non-ASCII control characters − These are characters beyond the ASCII character set of 128 characters. This range is part of the ISO-Latin character set and includes the entire «top half» of the ISO-Latin set 80-FF hex (128-255 decimal). A complete encoding table is given below.
- Reserved characters − These are special characters such as the dollar sign, ampersand, plus, common, forward slash, colon, semi-colon, equals sign, question mark, and «at» symbol. All of these can have different meanings inside a URL so need to be encoded. A complete encoding table is given below.
- Unsafe characters − These are space, quotation marks, less than symbol, greater than symbol, pound character, percent character, Left Curly Brace, Right Curly Brace, Pipe, Backslash, Caret, Tilde, Left Square Bracket, Right Square Bracket, Grave Accent. These character present the possibility of being misunderstood within URLs for various reasons. These characters should also always be encoded. A complete encoding table is given below.
The encoding notation replaces the desired character with three characters: a percent sign and two hexadecimal digits that correspond to the position of the character in the ASCII character set.
Example
One of the most common special characters is a white space. You can’t type a space in a URL directly. A space position in the character set is 20 hexadecimals. So you can use %20 in place of a space when passing your request to the server.
http://www.example.com/new%20pricing.htm
This URL actually retrieves a document named «new pricing.htm» from the www.example.com
ASCII Control Characters Encoding
This includes the encoding for character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal)
Decimal | Hex Value | Character | URL Encode |
---|---|---|---|
0 | 00 | %00 | |
1 | 01 | %01 | |
2 | 02 | %02 | |
3 | 03 | %03 | |
4 | 04 | %04 | |
5 | 05 | %05 | |
6 | 06 | %06 | |
7 | 07 | %07 | |
8 | 08 | backspace | %08 |
9 | 09 | tab | %09 |
10 | 0a | linefeed | %0a |
11 | 0b | %0b | |
12 | 0c | %0c | |
13 | 0d | carriage return | %0d |
14 | 0e | %0e | |
15 | 0f | %0f | |
16 | 10 | %10 | |
17 | 11 | %11 | |
18 | 12 | %12 | |
19 | 13 | %13 | |
20 | 14 | %14 | |
21 | 15 | %15 | |
22 | 16 | %16 | |
23 | 17 | %17 | |
24 | 18 | %18 | |
25 | 19 | %19 | |
26 | 1a | %1a | |
27 | 1b | %1b | |
28 | 1c | %1c | |
29 | 1d | %1d | |
30 | 1e | %1e | |
31 | 1f | %1f | |
127 | 7f | %7f |
Non-ASCII control characters encoding
This includes the encoding for the entire «top half» of the ISO-Latin set 80-FF hex (128255 decimal.)
Decimal | Hex Value | Character | URL Encode |
---|---|---|---|
128 | 80 | | %80 |
129 | 81 | | %81 |
130 | 82 | | %82 |
131 | 83 | | %83 |
132 | 84 | | %84 |
133 | 85 | %85 | |
134 | 86 | | %86 |
135 | 87 | | %87 |
136 | 88 | | %88 |
137 | 89 | | %89 |
138 | 8a | | %8a |
139 | 8b | | %8b |
140 | 8c | | %8c |
141 | 8d | | %8d |
142 | 8e | | %8e |
143 | 8f | | %8f |
144 | 90 | | %90 |
145 | 91 | | %91 |
146 | 92 | | %92 |
147 | 93 | | %93 |
148 | 94 | | %94 |
149 | 95 | | %95 |
150 | 96 | | %96 |
151 | 97 | | %97 |
152 | 98 | | %98 |
153 | 99 | | %99 |
154 | 9a | | %9a |
155 | 9b | | %9b |
156 | 9c | | %9c |
157 | 9d | | %9d |
158 | 9e | | %9e |
159 | 9f | | %9f |
160 | a0 | %a0 | |
161 | a1 | ¡ | %a1 |
162 | a2 | ¢ | %a2 |
163 | a3 | £ | %a3 |
164 | a4 | ¤ | %a4 |
165 | a5 | ¥ | %a5 |
166 | a6 | ¦ | %a6 |
167 | a7 | § | %a7 |
168 | a8 | ¨ | %a8 |
169 | a9 | © | %a9 |
170 | aa | ª | %aa |
171 | ab | « | %ab |
172 | ac | ¬ | %ac |
173 | ad | | %ad |
174 | ae | ® | %ae |
175 | af | ¯ | %af |
176 | b0 | ° | %b0 |
177 | b1 | ± | %b1 |
178 | b2 | ² | %b2 |
179 | b3 | ³ | %b3 |
180 | b4 | ´ | %b4 |
181 | b5 | µ | %b5 |
182 | b6 | ¶ | %b6 |
183 | b7 | · | %b7 |
184 | b8 | ¸ | %b8 |
185 | b9 | ¹ | %b9 |
186 | ba | º | %ba |
187 | bb | » | %bb |
188 | bc | ¼ | %bc |
189 | bd | ½ | %bd |
190 | be | ¾ | %be |
191 | bf | ¿ | %bf |
192 | c0 | À | %c0 |
193 | c1 | Á | %c1 |
194 | c2 | Â | %c2 |
195 | c3 | Ã | %c3 |
196 | c4 | Ä | %c4 |
197 | c5 | Å | %c5 |
198 | c6 | Æ | %v6 |
199 | c7 | Ç | %c7 |
200 | c8 | È | %c8 |
201 | c9 | É | %c9 |
202 | ca | Ê | %ca |
203 | cb | Ë | %cb |
204 | cc | Ì | %cc |
205 | cd | Í | %cd |
206 | ce | Î | %ce |
207 | cf | Ï | %cf |
208 | d0 | Ð | %d0 |
209 | d1 | Ñ | %d1 |
210 | d2 | Ò | %d2 |
211 | d3 | Ó | %d3 |
212 | d4 | Ô | %d4 |
213 | d5 | Õ | %d5 |
214 | d6 | Ö | %d6 |
215 | d7 | × | %d7 |
216 | d8 | Ø | %d8 |
217 | d9 | Ù | %d9 |
218 | da | Ú | %da |
219 | db | Û | %db |
220 | dc | Ü | %dc |
221 | dd | Ý | %dd |
222 | de | Þ | %de |
223 | df | ß | %df |
224 | e0 | à | %e0 |
225 | e1 | á | %e1 |
226 | e2 | â | %e2 |
227 | e3 | ã | %e3 |
228 | e4 | ä | %e4 |
229 | e5 | å | %e5 |
230 | e6 | æ | %e6 |
231 | e7 | ç | %e7 |
232 | e8 | è | %e8 |
233 | e9 | é | %e9 |
234 | ea | ê | %ea |
235 | eb | ë | %eb |
236 | ec | ì | %ec |
237 | ed | í | %ed |
238 | ee | î | %ee |
239 | ef | ï | %ef |
240 | f0 | ð | %f0 |
241 | f1 | ñ | %f1 |
242 | f2 | ò | %f2 |
243 | f3 | ó | %f3 |
244 | f4 | ô | %f4 |
245 | f5 | õ | %f5 |
246 | f6 | ö | %f6 |
247 | f7 | ÷ | %f7 |
248 | f8 | ø | %f8 |
249 | f9 | ù | %f9 |
250 | fa | ú | %fa |
251 | fb | û | %fb |
252 | fc | ü | %fc |
253 | fd | ý | %fd |
254 | fe | þ | %fe |
255 | ff | ÿ | %ff |
Reserved Characters Encoding
Following is the table to be used to encode reserved characters.
Decimal | Hex Value | Char | URL Encode |
---|---|---|---|
36 | 24 | $ | %24 |
38 | 26 | & | %26 |
43 | 2b | + | %2b |
44 | 2c | , | %2c |
47 | 2f | / | %2f |
58 | 3a | : | %3a |
59 | 3b | ; | %3b |
61 | 3d | = | %3d |
63 | 3f | ? | %3f |
64 | 40 | @ | %40 |
Unsafe Characters Encoding
Following is the table to be used to encode unsafe characters.
Decimal | Hex Value | Char | URL Encode |
---|---|---|---|
32 | 20 | space | %20 |
34 | 22 | « | %22 |
60 | 3c | %3c | |
62 | 3e | > | %3e |
35 | 23 | # | %23 |
37 | 25 | % | %25 |
123 | 7b | %7b | |
125 | 7d | > | %7d |
124 | 7c | | | %7c |
92 | 5c | \ | %5c |
94 | 5e | ^ | %5e |
126 | 7e | ~ | %7e |
91 | 5b | [ | %5b |
93 | 5d | ] | %5d |
96 | 60 | ` | %60 |
HTML Uniform Resource Locators
A URL can be composed of words (e.g. w3schools.com), or an Internet Protocol (IP) address (e.g. 192.68.20.50).
Most people enter the name when surfing, because names are easier to remember than numbers.
URL — Uniform Resource Locator
Web browsers request pages from web servers by using a URL.
A Uniform Resource Locator (URL) is used to address a document (or other data) on the web.
A web address like https://www.w3schools.com/html/default.asp follows these syntax rules:
- scheme — defines the type of Internet service (most common is http or https)
- prefix — defines a domain prefix (default for http is www)
- domain — defines the Internet domain name (like w3schools.com)
- port — defines the port number at the host (default for http is 80)
- path — defines a path at the server (If omitted: the root directory of the site)
- filename — defines the name of a document or resource
Common URL Schemes
The table below lists some common schemes:
Scheme | Short for | Used for |
---|---|---|
http | HyperText Transfer Protocol | Common web pages. Not encrypted |
https | Secure HyperText Transfer Protocol | Secure web pages. Encrypted |
ftp | File Transfer Protocol | Downloading or uploading files |
file | A file on your computer |
URL Encoding
URLs can only be sent over the Internet using the ASCII character-set. If a URL contains characters outside the ASCII set, the URL has to be converted.
URL encoding converts non-ASCII characters into a format that can be transmitted over the Internet.
URL encoding replaces non-ASCII characters with a «%» followed by hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign, or %20.
Try It Yourself
If you click «Submit», the browser will URL encode the input before it is sent to the server.
A page at the server will display the received input.
Try some other input and click Submit again.
ASCII Encoding Examples
Your browser will encode input, according to the character-set used in your page.
The default character-set in HTML5 is UTF-8.
Character | From Windows-1252 | From UTF-8 |
---|---|---|
€ | %80 | %E2%82%AC |
£ | %A3 | %C2%A3 |
© | %A9 | %C2%A9 |
® | %AE | %C2%AE |
À | %C0 | %C3%80 |
Á | %C1 | %C3%81 |
 | %C2 | %C3%82 |
à | %C3 | %C3%83 |
Ä | %C4 | %C3%84 |
Å | %C5 | %C3%85 |
For a complete reference of all URL encodings, visit our URL Encoding Reference.