I got the entire HTTP response as a string but I want to extract just the body.
I would prefer not to use an external library or reimplement the header parsing.
Content-Type: text/xml
Content-Length: 129
<?xml version='1.0'?>
<methodResponse>
<params>
<param>
<value><boolean>0</boolean></value>
</param>
</params>
</methodResponse>
</code>
Update: If it wasn't obvious, I do get the data from other source than an URL so any attempt to use something that requires and URL is useless.
Still I do read the data from a stream like object data = stream.read()
, so a solution that can use a stream is also acceptable.
2nd update , yes this is a XMLRPC response but it's one that I'm getting using a different transport so I cannot use httplib to parse it, mainly because httplib is broken and not accepting strings or streams for parsing.
3rd update , the double newline can be \\r\\n\\r\\n
or \\n\\n
based on the server.
So to make it clear: the input is a HTTP response that is supposed to contain an XMLRPC response and the output has to be the response
. It doesn't have to parse the XML, but it has to be able to properly extract the XML from the response.
Based on Michal solution but this one includes and essential fix:
def strip_http_headers(http_reply):
p = http_reply.find('\r\n\r\n')
if p >= 0:
return http_reply[p+4:]
return http_reply
In HTTP response headers are separated from body by two CRLF characters. So you can use string.find()
method like this:
p = http_reply.find('\r\n\r\n')
if p >= 0:
return http_reply[p:]
return http_reply
Short and sweet:
body = response.split('\r\n\r\n', 1)[-1]
(it uses two argument version of split()
and [-1]
means last element of array)
resp = ('Content-Type: text/xml\r\n'
'Content-Length: 129\r\n'
"<?xml version='1.0'?>\r\n"
'\r\n'
'<methodResponse>\r\n'
'<params>\r\n'
'<param>\r\n'
'<value><boolean>0</boolean></value>\r\n'
'</param>\r\n'
'</params>\r\n'
'</methodResponse>\r\n'
'</code>')
print resp.partition('\r\n\r\n')[2]
result
<methodResponse>
<params>
<param>
<value><boolean>0</boolean></value>
</param>
</params>
</methodResponse>
</code>
On my display, the characters '\\r' appear as squares at the end of each line.
The advantage of partition() is that it returns ALWAYS a tuple of 3 elements:
then, if there is not the sequence '\\r\\n\\r\\n' in the text,
resp.partition('\\r\\n\\r\\n')[2]
will be ""
while split('\\r\\n\\r\\n')[1]
causes an error and split('\\r\\n\\r\\n')[-1]
is the entire text.
If the double newline is variable, only a regex can hold the variability.
It is necessary to know what is the span of variability to craft a regex pattern.
Supposing that only "\\n\\n", "\\r\\n\\n", "\\n\\r\\n" and "\\r\\n\\r\\n" are possible , a solution would be to catch the body with help of the regex based on following pattern :
import re
regx = re.compile('(?:(?:\r?\n){2}|\Z)(.+)?',re.DOTALL)
for ss in (('Content-Type: text/xml\r\n'
'Content-Length: 129\r\n'
"<?xml version='1.0'?>\n"
'\n'
'body1\r\n'
'<params>\r\n'
'<param>\r\n'
'</code>') ,
('Content-Type: text/xml\r\n'
'Content-Length: 129\r\n'
"<?xml version='1.0'?>\r\n"
'\n'
'body2\r\n'
'<params>\r\n'
'<param>\r\n'
'</code>') ,
('Content-Type: text/xml\r\n'
'Content-Length: 129\r\n'
"<?xml version='1.0'?>\n"
'\r\n'
'body3\r\n'
'<params>\r\n'
'<param>\r\n'
'</code>') ,
('Content-Type: text/xml\r\n'
'Content-Length: 129\r\n'
"<?xml version='1.0'?>\r\n"
'\r\n'
'body4\r\n'
'<params>\r\n'
'<param>\r\n'
'</code>') ,
('Content-Type: text/xml\r\n'
'Content-Length: 129\r\r'
"<?xml version='1.0'?>\r\r"
'\r\n'
'body4\r\n'
'<params>\r\n'
'<param>\r\n'
'</code>') ,):
print ('splitting on sequence : %r\n%r\n') \
% (re.search('(?:\r?\n)+(?=body)',ss).group(),
regx.search(ss).group(1))
result
splitting on sequence : '\n\n'
'body1\r\n<params>\r\n<param>\r\n</code>'
splitting on sequence : '\r\n\n'
'body2\r\n<params>\r\n<param>\r\n</code>'
splitting on sequence : '\n\r\n'
'body3\r\n<params>\r\n<param>\r\n</code>'
splitting on sequence : '\r\n\r\n'
'body4\r\n<params>\r\n<param>\r\n</code>'
splitting on sequence : '\r\n'
None
Besides what Tito said, there's also the requests package
>>> import requests
>>> r = requests.get("http://yoururl")
>>> r
<Response [200]>
>>> r.content
...
And then parse it with minidom or whatever tool you choose for that.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.