How to parse raw HTTP request in Python 3?

Question

I am looking for a native way to parse an http request in Python 3.

This question shows a way to do it in Python 2, but uses now deprecated modules, (and Python 2) and I am looking for a way to do it in Python 3.

I would mainly like to just figure out what resource is requested and parse the headers and from a simple request. (ie):

GET /index.html HTTP/1.1
Host: localhost
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

Can someone show me a basic way to parse this request?

Answer 1

You could use the email.message.Message class from the email module in the standard library.

By modifying the answer from the question you linked, below is a Python3 example of parsing HTTP headers.

Suppose you wanted to create a dictionary containing all of your header fields:

import email
import pprint
from io import StringIO

request_string = 'GET / HTTP/1.1\r\nHost: localhost\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate, sdch\r\nAccept-Language: en-US,en;q=0.8'

# pop the first line so we only process headers
_, headers = request_string.split('\r\n', 1)

# construct a message from the request string
message = email.message_from_file(StringIO(headers))

# construct a dictionary containing the headers
headers = dict(message.items())

# pretty-print the dictionary of headers
pprint.pprint(headers, width=160)

if you ran this at a python prompt, the result would look like:

{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
 'Accept-Encoding': 'gzip, deflate, sdch',
 'Accept-Language': 'en-US,en;q=0.8',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Host': 'localhost',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}

Answer 2

Each one of those field names should be delimited by carriage return then newline, and then the field name and value are delimited by a colon. So assuming you already have the response as a string, it should be as easy as:

fields = resp.split("\r\n")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
    key,value = field.split(':')#split each line by http field name and value
    output[key] = value

Update 4/13

Using the example http resp in the linked to post:

resp = 'GET /search?sourceid=chrome&ie=UTF-8&q=ergterst HTTP/1.1\r\nHost: www.google.com\r\nConnection: keep-alive\r\nA
ccept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\nUser-Agent: Mozill
a/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.
13\r\nAccept-Encoding: gzip,deflate,sdch\r\nAvail-Dictionary: GeNLY2f-\r\nAccept-Language: en-US,en;q=0.8\r\n'


fields = resp.split("\r\n")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
    if not field:
        continue
    key,value = field.split(':')
    output[key] = value    
print(output)

An additional check to make sure field is not empty is needed. OUtput:

{'Host': ' www.google.com', 'Connection': ' keep-alive', 'Accept': ' application/xml,application/xhtml+xml,text/html;q=
0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'User-Agent': ' Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) App
leWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.13', 'Accept-Encoding': ' gzip,deflate,sdch', 'Avail-D
ictionary': ' GeNLY2f-', 'Accept-Language': ' en-US,en;q=0.8'}

Answer 3

They is another way, simpler and safer way to handle headers. More object oriented. #61189692 See Parse raw HTTP Headers

Answer 4

Here are some Python packages aimed at proper HTTP protocol parsing:

https://dpkt.readthedocs.io/en/latest/api/api_auto.html#module-dpkt.http
https://h11.readthedocs.io/en/latest/
https://github.com/benoitc/http-parser/ (C backend)
https://github.com/MagicStack/httptools (based on NodeJS's C backend)
https://github.com/silentsignal/netlib-offline (shameless plug)

How to parse raw HTTP request in Python 3?

Question

3 answers

solution1
4 ACCPTED 2016-08-23 01:42:57

solution2
2 2016-08-23 01:13:08

solution3
0 2020-04-13 16:50:33

solution4
0 2022-06-02 12:04:36

How to parse raw HTTP request in Python 3?

Question

3 answers

solution1 4 ACCPTED 2016-08-23 01:42:57

solution2 2 2016-08-23 01:13:08

solution3 0 2020-04-13 16:50:33

solution4 0 2022-06-02 12:04:36

solution1
4 ACCPTED 2016-08-23 01:42:57

solution2
2 2016-08-23 01:13:08

solution3
0 2020-04-13 16:50:33

solution4
0 2022-06-02 12:04:36