Create url in python

Building URLs in Python

Building URLs is really common in applications and APIs because most of the applications tend to be pretty interconnected. But how should we do it in Python? Here’s my take on the subject.

Let’s see how the different options compare.

The standard way

Python has a built in library that is specifically made for parsing URLs, called urllib.parse.

You can use the urllib.parse.urlsplit function to break a URL string to a five-item named tuple. The items are parsed

scheme://netloc/path?query#fragment

The opposite of breaking an URL to parts is to build it using the urllib.parse.urlunsplit function.

If you check the library documentation you’ll notice that there is also a urlparse function. The difference between it and the urlsplit function is an additional item in the parse result for path parameters.

https://www.example.com/some/path;parameter=12?q=query

Path parameters are separated with a semicolon from the path and located before the query arguments that start with a question mark. Most of the time you don’t need them but it is good to know that they exist.

So how would you then build an URL with urllib.parse?

Let’s assume that you want to call some API and need a function for building the API URL. The required URL could be for example:

https://example.com/api/v1/book/12?format=mp3&token=abbadabba

Here is how we could build the URL:

import os from urllib.parse import urlunsplit, urlencode  SCHEME = os.environ.get("API_SCHEME", "https") NETLOC = os.environ.get("API_NETLOC", "example.com")  def build_api_url(book_id, format, token):  path = f"/api/v1/book/book_id>"  query = urlencode(dict(format=format, token=token))  return urlunsplit((SCHEME, NETLOC, path, query, ""))

Calling the function works as expected:

>>> build_api_url(12, "mp3", "abbadabba") 'https://example.com/api/v1/book/12?format=mp3&token=abbadabba'

I used environment variables for the scheme and netloc because typically your program is calling a specific API endpoint that you might want to configure via the environment.

I also introduced the urlencode function which transforms a dictionary to a series of key=value pairs separated with & characters. This can be handy if you have lots of query arguments as a dictionary of values can be easier to manipulate.

The urllib.parse library also contains urljoin which is similar to os.path.join . It can be used to build URLs by combining a base URL with a path. Let’s modify the example code a bit.

import os from urllib.parse import urljoin, urlencode  BASE_URL = os.environ.get("BASE_URL", "https://example.com/")  def build_api_url(book_id, format, token):  path = f"/api/v1/book/book_id>"  query = "?" + urlencode(dict(format=format, token=token))  return urljoin(BASE_URL, path + query)

This time the whole base URL comes from the environment. The path and query are combined with the base URL using the urljoin function. Notice that this time the question mark at the beginning of the query needs to be set manually.

The manual way

Libraries can be nice but sometimes you just want to get things done without thinking that much. Here’s a straight forward way to build a URL manually.

import os  BASE_URL = os.environ.get(BASE_URL, "https://example.com").rstrip("/")  def build_api_url(book_id, format, token):  return f"BASE_URL>/api/v1/book/book_id>?format=format>&token=token>"

The f-strings in Python make this quite clean, especially with URLs that always have the same structure and not that many parameters. The BASE_URL initialization strips the tailing forward slash from the environment variable. This way the user doesn’t have to remember if it should be included or not.

Note that I haven’t added any validations for the input parameters in these examples so you may need take that into consideration.

The Furl way

Then there is a library called furl which aims to make URL parsing and manipulation easy. It can be installed with pip:

>> python3 -m pip install furl
import os from furl import furl  BASE_URL = os.environ.get("BASE_URL", "https://example.com")  def build_api_url(book_id, format, token):  f = furl(BASE_URL)  f /= f"/api/v1/book/book_id>"  f.args["format"] = format  f.args["token"] = token  return f.url

There are a bit more lines here when compared to the previous example. First we need to initialize a furl object from the base url. The path can be appended using the /= operator which is custom defined by the library.

The query arguments can be set with the args property dictionary. Finally, the final URL can be built by accessing the url property.

Here’s an alternative implementation using the set() method to change the path and query arguments of an existing URL.

def build_api_url(book_id, format, token):  return (  furl(BASE_URL)  .set(path=f"/api/v1/book/book_id>", args="format": format, "token": token>,)  .url  )

In addition to building URLs Furl lets you modify existing URLs and parse parts of them. You can find many more examples from the API documentation.

Conclusion

These are just some examples on how to create URLs. Which one do you prefer?

Read next in the Python bites series.

Источник

Python — Building URLs

The requests module can help us build the URLS and manipulate the URL value dynamically. Any sub-directory of the URL can be fetched programmatically and then some part of it can be substituted with new values to build new URLs.

Build_URL

The below example uses urljoin to fetch the different subfolders in the URL path. The urljoin method is used to add new values to the base URL.

from requests.compat import urljoin base='https://stackoverflow.com/questions/3764291' print urljoin(base,'.') print urljoin(base,'..') print urljoin(base,'. ') print urljoin(base,'/3764299/') url_query = urljoin(base,'?vers=1.0') print url_query url_sec = urljoin(url_query,'#section-5.4') print url_sec

When we run the above program, we get the following output −

https://stackoverflow.com/questions/ https://stackoverflow.com/ https://stackoverflow.com/questions/. https://stackoverflow.com/3764299/ https://stackoverflow.com/questions/3764291?vers=1.0 https://stackoverflow.com/questions/3764291?vers=1.0#section-5.4

Split the URLS

The URLs can also be split into many parts beyond the main address. The additional parameters which are used for a specific query or tags attached to the URL are separated by using the urlparse method as shown below.

from requests.compat import urlparse url1 = 'https://docs.python.org/2/py-modindex.html#cap-f' url2='https://docs.python.org/2/search.html?q=urlparse' print urlparse(url1) print urlparse(url2)

When we run the above program, we get the following output −

ParseResult(scheme='https', netloc='docs.python.org', path='/2/py-modindex.html', params='', query='', fragment='cap-f') ParseResult(scheme='https', netloc='docs.python.org', path='/2/search.html', params='', query='q=urlparse', fragment='')

Источник

Generate dynamically URL in python

You should dig into Python string formatting. It is one of the basic concepts of the language and you’ll benefit a bunch.

1 Answer 1

A URL is really just a string, any of the usual string manipulation techniques would do here. Your component parts don’t have any characters in them that would require URL-encoding here either, making the whole process simpler.

If you do have component parts that use characters that are not in the list of unreserved characters, then use the urllib.parse.quote() function to convert those characters to URL-safe components.

You could use str.join() with / to join string parts:

outbound_route = '1040' outbound_date = '19-02-2019' inbound_route = '1042' inbound_date = '20-02-2019' url = "https://booking.snav.it/#/booking/rates" # no trailing / final_url = '/'.join([url, outbound_route, outbound_date, inbound_route, inbound_date]) 
url = "https://booking.snav.it/#/booking/rates/" final_url = f'///' 

This approach has the advantage that the components don’t have to be strings; if outbound_route and inbound_route are integers, you don’t have to explicitly convert them to strings first.

Or, since URL paths work a lot like POSIX filesystem paths, you could use the pathlib.PosixPurePath() class to contruct the path:

from pathlib import PosixPurePath path = PosixPurePath('/booking/rates') / outbound_route / outbound_date / inbound_route / inbound_date final_url = f"https://booking.snav.it/#" 

In all cases, you end up with a final URL to use in requests :

res = requests.get(final_url, params=params) 

Источник

Читайте также:  Как открыть java приложения
Оцените статью