[英]How to get a string from a BS4 scrape
recently I scraped a text/javascript that contains the following code:最近我抓取了一个包含以下代码的文本/javascript:
var spConfigDisabledProducts = [-1
, '294653', '294655', '294656', '294657', '294658', '294659', '294660', '294661', '294662', '294663', '294664', '294666', '294667', '294668', '294669', '294670', '294671', '294672', '294673' ];
{"attributes":{"959":{"id":"959","code":"aw_taglia","label":"Taglia","options":[{"id":"1717","label":"15","price":"0","oldPrice":"0"...
I just want to get all the numbers inside var spConfigDisabledProducts excluded -1, so I tried this:我只想将 var spConfigDisabledProducts 中的所有数字排除在 -1 之外,所以我尝试了以下操作:
js = soup.find_all('script')[25].text.replace(',}', '}').replace(',]', ']').strip()
js = json.dumps(js)
obj = json.loads(js)
data_oos = obj.split('var spConfigDisabledProducts = [-1,')
data_oos = data_oos[1].split("];")
But it returns the entire javascript, not only var spConfigDisabledProducts.但它返回整个 javascript,而不仅仅是 var spConfigDisabledProducts。
How can I fix this?我怎样才能解决这个问题? Thanks in advance
提前致谢
You could regex out string representation of list and convert to actual list then slice您可以正则表达式出列表的字符串表示并转换为实际列表然后切片
import re, json, ast
s = '''var spConfigDisabledProducts = [-1
, '294653', '294655', '294656', '294657', '294658', '294659', '294660', '294661', '294662', '294663', '294664', '294666', '294667', '294668', '294669', '294670', '294671', '294672', '294673' ];
{"attributes":{"959":{"id":"959","code":"aw_taglia","label":"Taglia","options":[{"id":"1717","label":"15","price":"0","oldPrice":"0"'''
p = re.compile(r'spConfigDisabledProducts = (\[[\s\S]*?\])')
data = ast.literal_eval(p.findall(re.sub('\n|\s{2,}','',s))[0])
print(data[1:])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.