简体   繁体   English

如何从 BS4 刮取字符串

[英]How to get a string from a BS4 scrape

recently I scraped a text/javascript that contains the following code:最近我抓取了一个包含以下代码的文本/javascript:

var spConfigDisabledProducts = [-1
        , '294653', '294655', '294656', '294657', '294658', '294659', '294660', '294661', '294662', '294663', '294664', '294666', '294667', '294668', '294669', '294670', '294671', '294672', '294673'        ];
        {"attributes":{"959":{"id":"959","code":"aw_taglia","label":"Taglia","options":[{"id":"1717","label":"15","price":"0","oldPrice":"0"...

I just want to get all the numbers inside var spConfigDisabledProducts excluded -1, so I tried this:我只想将 var spConfigDisabledProducts 中的所有数字排除在 -1 之外,所以我尝试了以下操作:

js = soup.find_all('script')[25].text.replace(',}', '}').replace(',]', ']').strip()

js = json.dumps(js)
obj = json.loads(js)

data_oos = obj.split('var spConfigDisabledProducts = [-1,')
data_oos = data_oos[1].split("];")

But it returns the entire javascript, not only var spConfigDisabledProducts.但它返回整个 javascript,而不仅仅是 var spConfigDisabledProducts。

How can I fix this?我怎样才能解决这个问题? Thanks in advance提前致谢

You could regex out string representation of list and convert to actual list then slice您可以正则表达式出列表的字符串表示并转换为实际列表然后切片

import re, json, ast

s = '''var spConfigDisabledProducts = [-1
        , '294653', '294655', '294656', '294657', '294658', '294659', '294660', '294661', '294662', '294663', '294664', '294666', '294667', '294668', '294669', '294670', '294671', '294672', '294673'        ];
        {"attributes":{"959":{"id":"959","code":"aw_taglia","label":"Taglia","options":[{"id":"1717","label":"15","price":"0","oldPrice":"0"'''

p = re.compile(r'spConfigDisabledProducts = (\[[\s\S]*?\])')
data = ast.literal_eval(p.findall(re.sub('\n|\s{2,}','',s))[0])
print(data[1:])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM