简体   繁体   中英

Stripping headers response - Python

A typical HTTP 1.0 header looks like this:

Server: nginx/1.6.2 (Ubuntu)
Date: Thu, 03 Mar 2016 07:00:00 GMT
Content-Type: text/html
Content-Length: 13471
Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
Connection: close
ETag: "5674c418-349f"
Cache-Control: no-store
Accept-Ranges: bytes

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.

What's the easiest way for me to separate the beginning of the page (marked by <!doctype html> or <!DOCTYPE html> from the header of the HTTP request? For example

response = get_response() # get response is a string containing the page.
tokens = response.split("<!doctype html>") # won't work well.
return ''.join(tokens)

won't work well. I was looking into a way to split between the first half (header response) and the second half (the body)

You could just use find() with a lowercase version of the response as follows:

response = """
Server: nginx/1.6.2 (Ubuntu)
Date: Thu, 03 Mar 2016 07:00:00 GMT
Content-Type: text/html
Content-Length: 13471
Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
Connection: close
ETag: "5674c418-349f"
Cache-Control: no-store
Accept-Ranges: bytes

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.
"""

print response[response.lower().find('<!doctype html>'):]

This would print:

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.

Or perhaps just search for <!doctype

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM