Parse http header python

Содержание

Parse an HTTP request Authorization header with Python
10 Answers 10
Saved searches
Use saved searches to filter your results more quickly
License
dmeranda/httpheader
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
About

Parse an HTTP request Authorization header with Python

Is there a library to do this, or something I could look at for inspiration? I’m doing this on Google App Engine, and I’m not sure if the Pyparsing library is available, but maybe I could include it with my app if it is the best solution. Currently I’m creating my own MyHeaderParser object and using it with reduce() on the header string. It’s working, but very fragile. Brilliant solution by nadia below:

import re reg = re.compile('(\w+)[=] ?"?(\w+)"?') s = """Digest realm="stackoverflow.com", username="kixx" """ print str(dict(reg.findall(s)))

So far this solution has proven on only to be super clean, but also very robust. While not the most «by the book» implementation of the RFC, I’ve yet to build a test case that returns invalid values. However, I am only using this to parse the Authorization header, nonce of the other headers I’m interested in need parsing, so this may not be a good solution as a general HTTP header parser.

I came here looking for an full-fledged RFC-ified parser. Your question and the answer by @PaulMcG got me on the right path (see my answer below). Thank you both!

10 Answers 10

import re reg=re.compile('(\w+)[:=] ?"?(\w+)"?') >>>dict(reg.findall(headers))

Wow, I love Python. «Authorization:» is not actually part of the header string, so I did this instead: #! /usr/bin/env python import re def mymain(): reg = re.compile(‘(\w+)[=] ?»?(\w+)»?’) s = «»»Digest realm=»fireworksproject.com», username=»kristoffer» «»» print str(dict(reg.findall(s))) if name == ‘main‘: mymain() I’m not getting the «Digest» protocol declaration, but I don’t need it anyway. Essentially 3 lines of code. Brilliant.

Читайте также: Таблицы

If you find this and use it, make sure to add another question mark inside of «?(\w+)» so it becomes «?(\w+)?» this way if you pass along something as «» it returns the parameter and the value is undefined. And if you really want Digest: /(\w+)(?:([:=]) ?»?(\w+)?»?)?/ check to see if = exists in the match, if so its a key:value otherwise it’s something else.

Actually the » aren’t mandatory ( algorithm for example usually doesn’t delimit its value with » ) and a value itself can contain escaped » . «? is a bit risky =) (I asked the same question for PHP.)

You can also use urllib2 as [CheryPy][1] does.

input= """ Authorization: Digest qop="chap", realm="testrealm@host.com", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41" """ import urllib2 field, sep, value = input.partition("Authorization: Digest ") if value: items = urllib2.parse_http_list(value) opts = urllib2.parse_keqv_list(items) opts['protocol'] = 'Digest' print opts

In Python 3, these functions still exist (though they aren’t documented) but they’re in urllib.request instead of urllib2

Warning: urllib.request is one of the heaviest imports in the Python standard library. If you’re just using these two functions it might not be worth it.

Here’s my pyparsing attempt:

text = """Authorization: Digest qop="chap", realm="testrealm@host.com", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41" """ from pyparsing import * AUTH = Keyword("Authorization") ident = Word(alphas,alphanums) EQ = Suppress("=") quotedString.setParseAction(removeQuotes) valueDict = Dict(delimitedList(Group(ident + EQ + quotedString))) authentry = AUTH + ":" + ident("protocol") + valueDict print authentry.parseString(text).dump()

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', 'testrealm@host.com'], ['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], ['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']] - cnonce: 5ccc069c403ebaf9f0171e9517f40e41 - protocol: Digest - qop: chap - realm: testrealm@host.com - response: 6629fae49393a05397450978507c4ef1 - username: Foobear

I’m not familiar with the RFC, but I hope this gets you rolling.

This solution is the use of pyparsing that I was originally thinking of, and, as far as I can tell, it produces nice results.

An older question but one I found very helpful.

(edit after recent upvote) I’ve created a package that builds on this answer (link to tests to see how to use the class in the package).

I needed a parser to handle any properly formed Authorization header, as defined by RFC7235 (raise your hand if you enjoy reading ABNF).

Authorization = credentials BWS = OWS = Proxy-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge ] ) Proxy-Authorization = credentials WWW-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge ] ) auth-param = token BWS "=" BWS ( token / quoted-string ) auth-scheme = token challenge = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *( OWS "," [ OWS auth-param ] ) ] ) ] credentials = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *( OWS "," [ OWS auth-param ] ) ] ) ] quoted-string = token = token68 = 1*( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" ) *"="

Starting with PaulMcG’s answer, I came up with this:

import pyparsing as pp tchar = '!#$%&\'*+-.^_`|~' + pp.nums + pp.alphas t68char = '-._~+/' + pp.nums + pp.alphas token = pp.Word(tchar) token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('=')) scheme = token('scheme') header = pp.Keyword('Authorization') name = pp.Word(pp.alphas, pp.alphanums) value = pp.quotedString.setParseAction(pp.removeQuotes) name_value_pair = name + pp.Suppress('=') + value params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair))) credentials = scheme + (token68('token') ^ params('params')) auth_parser = header + pp.Suppress(':') + credentials

This allows for parsing any Authorization header:

parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==') print('Authenticating with scheme, token: '.format(parsed['scheme'], parsed['token']))

Authenticating with Basic scheme, token: Zm9vOmJhcg==

Bringing it all together into an Authenticator class:

import pyparsing as pp from base64 import b64decode import re class Authenticator: def __init__(self): """ Use pyparsing to create a parser for Authentication headers """ tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas t68char = '-._~+/' + pp.nums + pp.alphas token = pp.Word(tchar) token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('=')) scheme = token('scheme') auth_header = pp.Keyword('Authorization') name = pp.Word(pp.alphas, pp.alphanums) value = pp.quotedString.setParseAction(pp.removeQuotes) name_value_pair = name + pp.Suppress('=') + value params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair))) credentials = scheme + (token68('token') ^ params('params')) # the moment of truth. self.auth_parser = auth_header + pp.Suppress(':') + credentials def authenticate(self, auth_header): """ Parse auth_header and call the correct authentication handler """ authenticated = False try: parsed = self.auth_parser.parseString(auth_header) scheme = parsed['scheme'] details = parsed['token'] if 'token' in parsed.keys() else parsed['params'] print('Authenticating using scheme'.format(scheme)) try: safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower()) handler = getattr(self, 'auth_handle_' + safe_scheme) authenticated = handler(details) except AttributeError: print('This is a valid Authorization header, but we do not handle this scheme yet.') except pp.ParseException as ex: print('Not a valid Authorization header') print(ex) return authenticated # The following methods are fake, of course. They should use what's passed # to them to actually authenticate, and return True/False if successful. # For this demo I'll just print some of the values used to authenticate. @staticmethod def auth_handle_basic(token): print('- token is '.format(token)) try: username, password = b64decode(token).decode().split(':', 1) except Exception: raise DecodeError print('- username is '.format(username)) print('- password is '.format(password)) return True @staticmethod def auth_handle_bearer(token): print('- token is '.format(token)) return True @staticmethod def auth_handle_digest(params): print('- username is '.format(params['username'])) print('- cnonce is '.format(params['cnonce'])) return True @staticmethod def auth_handle_aws4_hmac_sha256(params): print('- Signature is '.format(params['Signature'])) return True

tests = [ 'Authorization: Digest qop="chap", realm="testrealm@example.com", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"', 'Authorization: Bearer cn389ncoiwuencr', 'Authorization: Basic Zm9vOmJhcg==', 'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"', 'Authorization: CrazyCustom foo="bar", fizz="buzz"', ] authenticator = Authenticator() for test in tests: authenticator.authenticate(test) print()

Authenticating using Digest scheme - username is Foobar - cnonce is 5ccc069c403ebaf9f0171e9517f40e41 Authenticating using Bearer scheme - token is cn389ncoiwuencr Authenticating using Basic scheme - token is Zm9vOmJhcg== - username is foo - password is bar Authenticating using AWS4-HMAC-SHA256 scheme - signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024 Authenticating using CrazyCustom scheme This is a valid Authorization header, but we do not handle this scheme yet.

In future if we wish to handle CrazyCustom we’ll just add

def auth_handle_crazycustom(params):

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Python module for parsing HTTP headers: Accept with qvalues, byte ranges, etc.

License

dmeranda/httpheader

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Httpheader is a Python module for dealing with HTTP headers and content negotiation. It provides a set of utility functions and classes which properly implement all the details and edge cases of the HTTP 1.1 protocol headers. Httpheader is intended to be used as part of a larger web framework or any application that must deal with HTTP.

In particular, httpheader can handle:

Byte range requests (multipart/byteranges)
Content negotiation (content type, language, all the Accept-* style headers; including full support for priority/qvalue handling.
Content/media type parameters
Conversion to and from HTTP date and time formats

There are a few classes defined by this module:

class content_type — media types such as ‘text/plain’
class language_tag — language tags such as ‘en-US’
class range_set — a collection of (byte) range specifiers
class range_spec — a single (byte) range specifier

The primary functions in this module may be categorized as follows:

Content negotiation functions.
- acceptable_content_type()
- acceptable_language()
- acceptable_charset()
- acceptable_encoding()
- parse_accept_header()
- parse_accept_language_header()
- parse_range_header()
- http_datetime()
- parse_http_datetime()
- quote_string()
- remove_comments()
- canonical_charset()
- parse_comma_list()
- parse_comment()
- parse_qvalue_accept_list()
- parse_media_type()
- parse_number()
- parse_parameter_list()
- parse_quoted_string()
- parse_range_set()
- parse_range_spec()
- parse_token()
- parse_token_or_quoted_string()
And there are some specialized exception classes:
- RFC 2616, «Hypertext Transfer Protocol — HTTP/1.1», June 1999. http://www.ietf.org/rfc/rfc2616.txt Errata at http://purl.org/NET/http-errata
- RFC 2046, «(MIME) Part Two: Media Types», November 1996. http://www.ietf.org/rfc/rfc2046.txt
- RFC 3066, «Tags for the Identification of Languages», January 2001. http://www.ietf.org/rfc/rfc3066.txt
Complete documentation and additional information is available on the httpheader project homepage.

This module is also registered on the Python Package Index (PyPI) as package «httpheader». This should make it easy to install into most Python environments.

About

Python module for parsing HTTP headers: Accept with qvalues, byte ranges, etc.

Источник

Parse http header python

Parse an HTTP request Authorization header with Python

10 Answers 10

Saved searches

Use saved searches to filter your results more quickly

License

dmeranda/httpheader

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

About