A typical HTTP 1.0
header looks like this:
Server: nginx/1.6.2 (Ubuntu)
Date: Thu, 03 Mar 2016 07:00:00 GMT
Content-Type: text/html
Content-Length: 13471
Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
Connection: close
ETag: "5674c418-349f"
Cache-Control: no-store
Accept-Ranges: bytes
<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.
What's the easiest way for me to separate the beginning of the page (marked by <!doctype html>
or <!DOCTYPE html>
from the header of the HTTP
request? For example
response = get_response() # get response is a string containing the page.
tokens = response.split("<!doctype html>") # won't work well.
return ''.join(tokens)
won't work well. I was looking into a way to split between the first half (header response) and the second half (the body)
You could just use find()
with a lowercase version of the response as follows:
response = """
Server: nginx/1.6.2 (Ubuntu)
Date: Thu, 03 Mar 2016 07:00:00 GMT
Content-Type: text/html
Content-Length: 13471
Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
Connection: close
ETag: "5674c418-349f"
Cache-Control: no-store
Accept-Ranges: bytes
<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.
"""
print response[response.lower().find('<!doctype html>'):]
This would print:
<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.
Or perhaps just search for <!doctype
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.