Python parsing http headers

headerparser 0.4.0

headerparser parses key-value pairs in the style of RFC 822 (e-mail) headers and converts them into case-insensitive dictionaries with the trailing message body (if any) attached. Fields can be converted to other types, marked required, or given default values using an API based on the standard library’s argparse module. (Everyone loves argparse , right?) Low-level functions for just scanning header fields (breaking them into sequences of key-value pairs without any further processing) are also included.

The Format

RFC 822-style headers are header fields that follow the general format of e-mail headers as specified by RFC 822 and friends: each field is a line of the form “ Name: Value ”, with long values continued onto multiple lines (“folded”) by indenting the extra lines. A blank line marks the end of the header section and the beginning of the message body.

This basic grammar has been used by numerous textual formats besides e-mail, including but not limited to:

  • HTTP request & response headers
  • Usenet messages
  • most Python packaging metadata files
  • Debian packaging control files
  • META-INF/MANIFEST.MF files in Java JARs
  • a subset of the YAML serialization format

— all of which this package can parse.

Installation

Just use pip (You have pip, right?) to install headerparser and its dependencies:

Читайте также:  Handle image in javascript

Examples

>>> import headerparser >>> parser = headerparser.HeaderParser() >>> parser.add_field('Name', required=True) >>> parser.add_field('Type', choices=['example', 'demonstration', 'prototype'], default='example') >>> parser.add_field('Public', type=headerparser.BOOL, default=False) >>> parser.add_field('Tag', multiple=True) >>> parser.add_field('Data')

Parse some headers and inspect the results:

>>> msg = parser.parse_string('''\ . Name: Sample Input . Public: yes . tag: doctest, examples, . whatever . TAG: README . . Wait, why I am using a body instead of the "Data" field? . ''') >>> sorted(msg.keys()) ['Name', 'Public', 'Tag', 'Type'] >>> msg['Name'] 'Sample Input' >>> msg['Public'] True >>> msg['Tag'] ['doctest, examples,\n whatever', 'README'] >>> msg['TYPE'] 'example' >>> msg['Data'] Traceback (most recent call last): . KeyError: 'data' >>> msg.body 'Wait, why I am using a body instead of the "Data" field?\n'

Fail to parse headers that don’t meet your requirements:

>>> parser.parse_string('Type: demonstration') Traceback (most recent call last): . headerparser.errors.MissingFieldError: Required header field 'Name' is not present >>> parser.parse_string('Name: Bad type\nType: other') Traceback (most recent call last): . headerparser.errors.InvalidChoiceError: 'other' is not a valid choice for 'Type' >>> parser.parse_string('Name: unknown field\nField: Value') Traceback (most recent call last): . headerparser.errors.UnknownFieldError: Unknown header field 'Field'

Allow fields you didn’t even think of:

>>> parser.add_additional() >>> msg = parser.parse_string('Name: unknown field\nField: Value') >>> msg['Field'] 'Value'

Just split some headers into names & values and worry about validity later:

>>> for field in headerparser.scan_string('''\ . Name: Scanner Sample . Unknown headers: no problem . Unparsed-Boolean: yes . CaSe-SeNsItIvE-rEsUlTs: true . Whitespace around colons:optional . Whitespace around colons : I already said it's optional. . That means you have the _option_ to use as much as you want! . . And there's a body, too, I guess. . '''): print(field) ('Name', 'Scanner Sample') ('Unknown headers', 'no problem') ('Unparsed-Boolean', 'yes') ('CaSe-SeNsItIvE-rEsUlTs', 'true') ('Whitespace around colons', 'optional') ('Whitespace around colons', "I already said it's optional.\n That means you have the _option_ to use as much as you want!") (None, "And there's a body, too, I guess.\n")

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Python module for parsing HTTP headers: Accept with qvalues, byte ranges, etc.

License

dmeranda/httpheader

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Httpheader is a Python module for dealing with HTTP headers and content negotiation. It provides a set of utility functions and classes which properly implement all the details and edge cases of the HTTP 1.1 protocol headers. Httpheader is intended to be used as part of a larger web framework or any application that must deal with HTTP.

In particular, httpheader can handle:

  • Byte range requests (multipart/byteranges)
  • Content negotiation (content type, language, all the Accept-* style headers; including full support for priority/qvalue handling.
  • Content/media type parameters
  • Conversion to and from HTTP date and time formats

There are a few classes defined by this module:

  • class content_type — media types such as ‘text/plain’
  • class language_tag — language tags such as ‘en-US’
  • class range_set — a collection of (byte) range specifiers
  • class range_spec — a single (byte) range specifier

The primary functions in this module may be categorized as follows:

  • Content negotiation functions.
    • acceptable_content_type()
    • acceptable_language()
    • acceptable_charset()
    • acceptable_encoding()
    • parse_accept_header()
    • parse_accept_language_header()
    • parse_range_header()
    • http_datetime()
    • parse_http_datetime()
    • quote_string()
    • remove_comments()
    • canonical_charset()
    • parse_comma_list()
    • parse_comment()
    • parse_qvalue_accept_list()
    • parse_media_type()
    • parse_number()
    • parse_parameter_list()
    • parse_quoted_string()
    • parse_range_set()
    • parse_range_spec()
    • parse_token()
    • parse_token_or_quoted_string()

    And there are some specialized exception classes:

    • RFC 2616, «Hypertext Transfer Protocol — HTTP/1.1», June 1999. http://www.ietf.org/rfc/rfc2616.txt Errata at http://purl.org/NET/http-errata
    • RFC 2046, «(MIME) Part Two: Media Types», November 1996. http://www.ietf.org/rfc/rfc2046.txt
    • RFC 3066, «Tags for the Identification of Languages», January 2001. http://www.ietf.org/rfc/rfc3066.txt

    Complete documentation and additional information is available on the httpheader project homepage.

    This module is also registered on the Python Package Index (PyPI) as package «httpheader». This should make it easy to install into most Python environments.

    About

    Python module for parsing HTTP headers: Accept with qvalues, byte ranges, etc.

    Источник

Оцените статью