Python join url params

Parsing URL in Python

Urllib module in Python is used to access, and interact with, websites using URL (Uniform Resource Locator). A URL (colloquially termed a web address) is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.

Urilib has below modules for working with URL

  • urllib.request for opening and reading URLs
  • urllib.error containing the exceptions raised by urllib.request
  • urllib.parse for parsing URLs
  • urllib.robotparser for parsing robots.txt files

Parsing Urls

urllib.parse module defines interface to break URL strings up in components (addressing scheme, network location, path etc.), to combine the components back into a URL string.

urlparse() parse a URL into six components. urlsplit() is similar to urlparse(), but does not split the params from the URL. For example:

import urllib.parse sample_url = "http://example.com:8080/example.html?val1=1&val2=Hello" # Parse URL with urlparse() result = urllib.parse.urlparse(sample_url) print(result) # Output # ParseResult(scheme='http', netloc='example.com:8080', path='/example.html', params='', query='val1=1&val2=Hello', fragment='') print("Scheme : " + result.scheme) print("HostName : " + result.hostname) print("Path : " + result.path) # Output # Scheme : http # HostName : example.com # Path : /example.html print(result.geturl()) # Output # http://example.com:8080/example.html?val1=1&val2=Hello result = urllib.parse.urlsplit(sample_url) print(result) # Output # SplitResult(scheme='http', netloc='example.com:8080', path='/example.html', query='val1=1&val2=Hello', fragment='')

urljoin() construct a absolute URL by combining a base URL with another URL. It uses addressing scheme, the network location to provide missing components in the relative URL. For example:

import urllib.parse # Join URL with urljoin() print(urllib.parse.urljoin('http://example.com:8080/example.html', 'FAQ.html')) # Output # http://example.com:8080/FAQ.html

Quoting URL

URL quoting module provides functions to make program data safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text. It also support reversing these operations to recreate the original data from the contents of a URL component.

Читайте также:  Set for dictionary python

quote() function replace special characters in string using the %xx escape. It is intended for quoting the path section of URL. quote_plus() is similar to quote(), but it replace spaces by plus signs. Plus signs in the original string are escaped unless they are included in safe. To get back the URL from the quoted url, use unquote(). It %xx escapes by their single-character equivalent. unquote_plus() is similar to unquote(), but also replace plus signs by spaces, as required for unquoting HTML form values. Syntax of above functions

quote(string, safe='/', encoding=None, errors=None) quote_plus(string, safe='', encoding=None, errors=None) unquote(string, encoding='utf-8', errors='replace') unquote_plus(string, encoding='utf-8', errors='replace')

Following example demonstrate quoting of program data, so that they can be used as component of URL.

import urllib.parse sample_string = "Hello El Niño" # Replaces special characters for use in URLs quoteStr = urllib.parse.quote(sample_string) print(quoteStr) # Output # Hello%20El%20Ni%C3%B1o # Replaces special characters for use in URLs quotePlusStr = urllib.parse.quote_plus(sample_string) print(quotePlusStr) # Output # Hello+El+Ni%C3%B1o # Get back actual string from quoted string print(urllib.parse.unquote(quoteStr)) # Output # Hello El Niño print(urllib.parse.unquote_plus(quotePlusStr)) # Output # Hello El Niño

Manipulating Query Parameter

urlencode() convert a mapping object or a sequence of two-element tuples to a percent-encoded ASCII text string. It returns string containing series of key=value pairs separated by ‘&’ characters, where both key and value are quoted. The order of parameters in the encoded string will match the order of parameter tuples in the sequence. To reverse encoding process, use parse_qs() and parse_qsl() to parse query strings into Python data structures. Syntax of the function are

urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=quote_plus) urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None) urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)

parse_qs() parse a query string given as a string argument and returns a dictionary. Dictionary keys are the unique query variable names and the values are lists of values for each name. parse_qsl() parse a query string given as a string argument and returs as a list of name, value pairs.

import urllib.parse # Use urlencode() to convert maps to parameter strings query_data = < 'name': "Mango", "type": "Fruit", "price": 37 >result = urllib.parse.urlencode(query_data) print(result) # Output # name=Mango&type=Fruit&price=37 print(urllib.parse.parse_qs(result)) # Output # print(urllib.parse.parse_qsl(result)) # Output # [('name', 'Mango'), ('type', 'Fruit'), ('price', '37')]

Источник

Python join url params

Last updated: Feb 18, 2023
Reading time · 2 min

banner

# Join a base URL with another URL in Python

Use the urljoin method from the urllib.parse module to join a base URL with another URL.

The urljoin method constructs a full (absolute) URL by combining a base URL with another URL.

Copied!
from urllib.parse import urljoin base_url = 'https://bobbyhadz.com' path = 'images/static/cat.jpg' # ✅ Join a base URL with another URL result = urljoin(base_url, path) # 👇️ https://bobbyhadz.com/images/static/cat.jpg print(result) # --------------------------------------- # ✅ Join URL path components when constructing a URL # 👇️ /global/images/static/dog.png print(urljoin('/global/images/', 'static/dog.png'))

join base url with another url

If you have multiple URL components, use the posixpath module to join them before passing them to the urljoin() method.

Copied!
import posixpath from urllib.parse import urljoin base_url = 'https://bobbyhadz.com' path_1 = 'images' path_2 = 'static' path_3 = 'cat.jpg' path = posixpath.join(path_1, path_2, path_3) print(path) # 👉️ 'images/static/cat.jpg' result = urljoin(base_url, path) # 👇️ https://bobbyhadz.com/images/static/cat.jpg print(result)

using posixpath to join path components

The urllib.parse.urljoin method takes a base URL and another URL as parameters and constructs a full (absolute) URL by combining them.

# Joining URL path components when constructing a URL

You can also use the urljoin method to join URL path components when constructing a URL.

Copied!
from urllib.parse import urljoin # ✅ Join URL path components # 👇️ /global/images/static/dog.png print(urljoin('/global/images/', 'static/dog.png'))

joining url path components when constructing url

Make sure the output you get is what you expect because the urljoin() method can be a bit confusing when working with URL components that don’t end in a forward slash / .

Copied!
from urllib.parse import urljoin # 👇️ /global/static/dog.png print(urljoin('/global/images', 'static/dog.png'))

Notice that the method stripped images from the first component before joining the second component.

The method behaves as expected when the first component ends with a forward slash.

Copied!
from urllib.parse import urljoin # 👇️ /global/images/static/dog.png print(urljoin('/global/images/', 'static/dog.png'))

You might also notice confusing behavior if the second component starts with a forward slash.

Copied!
from urllib.parse import urljoin # 👇️ /static/dog.png print(urljoin('/global/images', '/static/dog.png'))

When the second component starts with a forward slash, it is assumed to start at the root.

# Joining URL path components with posixpath.join()

The posixpath.join() method is a bit more predictable and could also be used to join URL path components.

Copied!
import posixpath # 👇️ /global/images/static/dog.png print(posixpath.join('/global/images', 'static/dog.png')) # 👇️ /global/images/static/dog.png print(posixpath.join('/global/images/', 'static/dog.png')) # 👇️ /static/dog.png print(posixpath.join('/global/images', '/static/dog.png'))

joining url path components with posixpath join

The posixpath.join method can also be passed more than 2 paths.

Copied!
import posixpath # 👇️ /global/images/static/dog.png print(posixpath.join('/global', 'images', 'static', 'dog.png'))

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.

Источник

Оцените статью