简体   繁体   中英

Parse response in python

When i sending some data on host:

r = urllib2.Request(url, data = data, headers = headers)
page = urllib2.urlopen(r)

soup = BeautifulSoup(page.read(), fromEncoding="cp-1251")
print page.read()

i have something like this:

[{"command":"settings","settings":{"basePath":"\/","ajaxPageState":{"theme":"spsr","theme_token":"kRHUhchUVpxAMYL8Y8IoyYIcX0cPrUstziAi8gSmMYk","css":[]},"ajax":{"edit-submit":{"callback":"spsr_calculator_form_ajax","wrapper":"calculator_form","method":"replaceWith","event":"mousedown","keypress":true,"url":"\/ru\/system\/ajax","submit":{"_triggering_element_name":"submit"}}}},"merge":true},{"command":"insert","method":null,"selector":null,"data":"\u003cdiv id=\"calculator_form\"\u003e\u003cform action=\"\/ru\/service\/calculator\" method=\"post\" id=\"spsr-calculator-form\" accept-charset=\"UTF-8\"\u003e\u003cdiv\u003e\u003cinput id=\"edit-from-ship-region-id\" type=\"hidden\" name=\"from_ship_region_id\" value=\"\" \/\u003e\n\u003cinput type=\"hidden\" name=\"form_build_id\" value=\"form-0RK_WFli4b2kUDTxpoqsGPp14B_0yf6Fz9x7UK-T3w8\" \/\u003e\n\u003cinput type=\"hidden\" name=\"form_id\" value=\"spsr_calculator_form\" \/\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"bg_p\"\u003e \n\u0421\u0435\u0439\u0447\u0430\u0441 \u0412\u044b... bla bla bla

but i want have something, like this:

<html><h1>bla bla bla</h1></html>

How can i do it?

The answer you are getting is very likely encoded in JSON. If this is true then using BeautifulSoup doesn't make any sense (it is a HTML/XML parser). If you have JSON data you will need to use a JSON parser. Calling page.read() twice doesn't make any sense either since it won't return you anything sane after the first call.

Rewriting your request part we get:

r = urllib2.Request(url, data = data, headers = headers)
page = urllib2.urlopen(r)
data = page.read()

Now instead of an HTML parser, we need to use a JSON parser. This can be done with json library (in Python since 2.6):

import json
decoded_data = json.loads(data)

Now, just locate which part of the model you want to extract. Considering your example and give you want to print out the section with "blabla", you can write:

result = unicode(decoded_data[1][u'data'])

For debugging try:

print result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM