Python从HTTP响应中提取JSON

Question

Say I have the following HTTP request: 说我有以下HTTP请求：

GET /4 HTTP/1.1
Host: graph.facebook.com

And the server returns the following response: 服务器返回以下响应：

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Cache-Control: private, no-cache, no-store, must-revalidate
Content-Type: text/javascript; charset=UTF-8
ETag: "539feb8aee5c3d20a2ebacd02db380b27243b255"
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache
X-FB-Rev: 1070755
X-FB-Debug: pC4b0ONpdhLwBn6jcabovcZf44bkfKSEguNsVKuSI1I=
Date: Wed, 08 Jan 2014 01:22:36 GMT
Connection: keep-alive
Content-Length: 172

{"id":"4","name":"Mark Zuckerberg","first_name":"Mark","last_name":"Zuckerberg","link":"http:\/\/www.facebook.com\/zuck","username":"zuck","gender":"male","locale":"en_US"}

Since the Content-Lengh header depends on the length of the content, I cannot simply split by the Content-Length: 172 string. 由于Content-Lengh标头取决于内容的长度，因此我不能简单地按Content-Length: 172字符串进行拆分。 How can I extract the JSON and headers separately? 如何分别提取JSON和标头？ They are both important to my program. 它们对我的程序都很重要。 I am using this code to get the response: 我正在使用此代码来获取响应：

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("graph.facebook.com", 80))
s.send("GET /"+str(id)+"/picture HTTP/1.1\r\nHost: graph.facebook.com\r\n\r\n")
data = s.recv(1024)
s.close()
json_string = (somehow extract this)
userdata = json.loads(json_string)

Answer 1

The easy way to do this is to use an HTTP library. 最简单的方法是使用HTTP库。 For example: 例如：

import json
import urllib2

r = urllib2.urlopen("http://graph.facebook.com/{}/picture".format(id))
json_string = r.read()
userdata = json.loads(json_string)

If you really want to parse it yourself, the HTTP protocol guarantees that headers and body are separated by an empty line, and that this will be the first empty line anywhere in the response, so it's not that hard: 如果您真的想自己解析它， HTTP协议将保证标头和正文由空行分隔，并且这将是响应中任何地方的第一个空行，因此并不难：

data = s.recv(1024)
header, _, json_string = data.partition('\r\n\r\n')
userdata = json.loads(json_string)

There are some obvious down sides to this—as written, your code won't work if the response is longer than 1K, or if the kernel doesn't give you the whole response in a single recv (which it's never guaranteed to do), or if the server redirects you or gives you a 100 CONTINUE before the real response, or if the server decides to send back a chunked or MIME-multipart or other response instead of a flat body, or… 这样做有一些明显的弊端-如所写，如果响应的长度超过1K，或者如果内核无法在单个recv为您提供整个响应，则您的代码将无法正常工作（永远不能保证这样做），或者服务器在真实响应之前将您重定向或给您100 CONTINUE，或者服务器决定发回分块的或MIME多重响应或其他响应，而不是扁平体，或者…

Python从HTTP响应中提取JSON

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-01-08 01:54:35

Python从HTTP响应中提取JSON

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-01-08 01:54:35

解决方案1
5 已采纳 2014-01-08 01:54:35