简体   繁体   中英

How to parse the value of Content-Type from an HTTP Header Response?

My application makes numerous HTTP requests. Without writing a regular expression, how do I parse Content-Type header values? For example:

text/html; charset=UTF-8

For context, here is my code for getting stuff in the internet:

from requests import head

foo = head("http://www.example.com")

The output I am expecting is similar to what the methods do in mimetools . For example:

x = magic("text/html; charset=UTF-8")

Will output:

x.getparam('charset')  # UTF-8
x.getmaintype()  # text
x.getsubtype()  # html

requests doesn't give you an interface to parse the content type, unfortunately, and the standard library on this stuff is a bit of a mess. So I see two options:

Option 1 : Go use the python-mimeparse third-party library.

Option 2 : To separate the mime type from options like charset , you can use the same technique that requests uses to parse type/encoding internally: use cgi.parse_header .

response = requests.head('http://example.com')
mimetype, options = cgi.parse_header(response.headers['Content-Type'])

The rest should be simple enough to handle with a split :

maintype, subtype = mimetype.split('/')

Your question is bit unclear. I assume that you are using some sort of web application framework such as Django or Flask?

Here is example how to read Content-Type using Flask:

from flask import Flask, request
app = Flask(__name__)

@app.route("/")
def test():
  request.headers.get('Content-Type')


if __name__ == "__main__":
  app.run()

Your response ( foo ) will have a dictionary with the headers. Try something like:

foo.headers.get('content-type')

Or print foo.headers to see all the headers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM