[英]Extract data using bs4 from a javascript text span
im trying to extract some data from a span that is after a text/javascript script, i tried with regex both its to fragile: how can i get the span after text/javascript?我试图从文本/javascript脚本之后的跨度中提取一些数据,我尝试使用正则表达式使其变得脆弱:如何获得文本/javascript之后的跨度?
html_content = urlopen('https://www.icewarehouse.com/Bauer_Vapor_1X/descpage-V1XS7.html')
soup = BeautifulSoup(html_content, "lxml")
price =soup.find(class_='crossout')
span = price('span')
print(span)
output disired: output 不受欢迎:
649.99 949.99
I think you are trying to get the minimum and maximum of the array msrp
.我认为您正在尝试获取数组
msrp
的最小值和最大值。 In which case you can't use BS for that.在这种情况下,您不能为此使用 BS。 Use plain re.
使用普通的重新。
Try this:尝试这个:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html_content =urlopen('https://www.icewarehouse.com/Bauer_Vapor_1X/descpage-V1XS7.html')
soup = BeautifulSoup(html_content, "lxml")
pattern = re.compile("msrp.push\((.*?)\);.*msrp.push\((.*?)\);")
m = pattern.search(soup.text)
if m:
print(m[1], m[2])
This uses two capturing groups to get the minimum and maximum values from the line where values are pushed into the array msrp
.这使用两个捕获组从将值推入数组
msrp
的行中获取最小值和最大值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.