简体   繁体   中英

Parsing javaScript arrays in the Python dictionaries

So I have a public webpage that contains something like the following code:

var arrayA = new Array();
arrayA[0] = new customItem("1","Name1","description1",1.000,2.000);arrayA[1] = new customItem("2","Name2","description2",4.000,8.000);

What I want to do is to have Python to read this page and convert the data into 2 dictionaries with the name+description being the key.

ie,

dict1["Name1Description1"] = 1.000

dict2["Name1Description1"] = 2.000

dict1["Name2Description2"] = 4.000

dict2["Name2Description2"] = 8.000

Is there an easy way we could do this or we pretty much have to parse it as any other string? Obviously the array could be of any length.

Thanks!

Yes, this is possible using regular expressions.

import re

st = '''
var arrayA = new Array();
arrayA[0] = new customItem("1","Name1","description1",1.000,2.000);arrayA[1] = new customItem("2","Name2","description2",4.000,8.000);
'''

dict1, dict2 = {}, {}
matches = re.findall('\"(\d+)\",\"(.*?)\",\"(.*?)\",(\d+.\d+),(\d+.\d+)', st, re.DOTALL)
for m in matches:
    key = m[1] + m[2]
    dict1[key] = float(m[3])
    dict2[key] = float(m[4])

print(dict1)
print(dict2)

# {'Name1description1': 1.0, 'Name2description2': 4.0}
# {'Name1description1': 2.0, 'Name2description2': 8.0}

The logic of the regular expression is:

\" - Match a double quote
\"(\d+)\" - Match any number of digits contained in between two double quotes
\"(.*?)\" - Match any number of any characters contained between two double quotes
(\d+.\d+) - Match any number of numbers with a dot followed by any number of numbers
, - Match a comma

So the regular expression will match the js string input with this expected pattern. But I assume the js is without spaces between the commas. You could first strip out of the commas and then run it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM