[英]How can I simplify this python regex code?
I'm sure there is a better way of cleaning up a section of my web scrape.我确信有更好的方法来清理我的 web 刮擦的一部分。 Can someone walk me through it?有人可以带我过去吗?
#Query:[<div class="price">
<span class="price-currency">$</span>
<label for="low-price" hidden="">Low Price</label>
<input class="price-filter" data-val="true" data-val-number="The field LowPrice must be a number." data-val-required="The LowPrice field is required." id="low-price" name="SearchCriteria.LowPrice" placeholder="Min" type="text" value="0.00">
<span class="price-currency">$</span>
<label for="high-price" hidden="">Low Price</label>
<input class="price-filter" data-val="true" data-val-number="The field HighPrice must be a number." data-val-required="The HighPrice field is required." id="high-price" name="SearchCriteria.HighPrice" placeholder="Max" type="text" value="999999.00">
</input></input></div>, <div class="price">
$1,001.00 </div>]
prices = soup.find_all("div", {"class": "price"})
for price in prices:
cleanPrice = price.text
finalPrice = re.sub(r"\s\s+", " ", cleanPrice)
finalPrice2 = re.sub(r"Low Price", "", finalPrice)
finalPrice3 = re.sub(r"\n", "", finalPrice2)
finalPrice4 = re.sub(r" ", "", finalPrice3)
finalPrice5 = re.sub(r"\s\w", "", finalPrice4)
finalPrice6 = re.sub(r"\s*$", "", finalPrice5)
finalPrice7 = re.sub(r"\$\$", "", finalPrice6)
pricevalues.append(finalPrice7)
You can pass in a text
argument:你可以传入一个text
参数:
import re
from bs4 import BeautifulSoup
html_doc = """#Query:[<div class="price">
<span class="price-currency">$</span>
<label for="low-price" hidden="">Low Price</label>
<input class="price-filter" data-val="true" data-val-number="The field LowPrice must be a number." data-val-required="The LowPrice field is required." id="low-price" name="SearchCriteria.LowPrice" placeholder="Min" type="text" value="0.00">
<span class="price-currency">$</span>
<label for="high-price" hidden="">Low Price</label>
<input class="price-filter" data-val="true" data-val-number="The field HighPrice must be a number." data-val-required="The HighPrice field is required." id="high-price" name="SearchCriteria.HighPrice" placeholder="Max" type="text" value="999999.00">
</input></input></div>, <div class="price">
$1,001.00 </div>]"""
soup = BeautifulSoup(html_doc, 'html.parser')
prices = soup.find_all("div", {"class": "price"}, text=re.compile('1,001.00'))
print(prices[0].text.strip())
Outputs:输出:
$1,001.00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.