简体   繁体   中英

How do I convert javascript postData to Python list?

I use Scrapy to mimic Post Request from the page. Need to get payload values from the following extract. I need to get the values( postData ) from this JS construction into python list.

<a style="color: red;font-size: 12px;font-weight: bolder" target="_self" title="Click here for processing" onclick="return postData('714','714','null','','','TADIKONDA','0713006','TADIKONDA','','1044','EXE DNO 1046 LAND','KARLAPUDI ROSAIAH, EEDA ANJI REDDY LAND','EXE BALANCE LAND','NANDIPATI VENKATESWARLU ETC LAND','0','0','01/01/1983','25/09/2018','t','16/02/2018','1')" href="#"> Next</a>

What kind of data type is this postData ?

So, what I do is as follows:

s = response.xpath("//td[@class = 'formbg1']/a/@onclick").extract()[0].split('Data')[1][1:-1].replace("'","").split(',')

Which returns a list. The problem, however, is that one of the values has , in it, So it is broken into two separate values in a list, which is no good. Like this one is supposed to be a single list value, but appears to be 2 values.

,'KARLAPUDI ROSAIAH, EEDA ANJI REDDY LAND',

So how do convert this postData into python list saving all values as they are?

This is no data type, it is an arbitrary javascript function defined by the page you are working with and the values here are arguments to that function which is called when the link is clicked. You could parse it a little "by hand" to be seen as json for example like so:

my_list = json.loads('[' + extracted_raw_string[16:-1].replace("'", '"') + ']')

Not very robust but does the trick, will fail if there are additional double / single quotes inside the string values. Otherwise check js2xml or slimit for parsing javascript.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM