簡體   English   中英

如何簡化此 python 正則表達式代碼?

[英]How can I simplify this python regex code?

我確信有更好的方法來清理我的 web 刮擦的一部分。 有人可以帶我過去嗎?

#Query:[<div class="price">
<span class="price-currency">$</span>
<label for="low-price" hidden="">Low Price</label>
<input class="price-filter" data-val="true" data-val-number="The field LowPrice must be a number." data-val-required="The LowPrice field is required." id="low-price" name="SearchCriteria.LowPrice" placeholder="Min" type="text" value="0.00">
<span class="price-currency">$</span>
<label for="high-price" hidden="">Low Price</label>
<input class="price-filter" data-val="true" data-val-number="The field HighPrice must be a number." data-val-required="The HighPrice field is required." id="high-price" name="SearchCriteria.HighPrice" placeholder="Max" type="text" value="999999.00">
</input></input></div>, <div class="price">
$1,001.00                                    </div>]

prices = soup.find_all("div", {"class": "price"})

for price in prices:
    cleanPrice = price.text
    finalPrice = re.sub(r"\s\s+", " ", cleanPrice)
    finalPrice2 = re.sub(r"Low Price", "", finalPrice)
    finalPrice3 = re.sub(r"\n", "", finalPrice2)
    finalPrice4 = re.sub(r" ", "", finalPrice3)
    finalPrice5 = re.sub(r"\s\w", "", finalPrice4)
    finalPrice6 = re.sub(r"\s*$", "", finalPrice5)
    finalPrice7 = re.sub(r"\$\$", "", finalPrice6)
    pricevalues.append(finalPrice7)

你可以傳入一個text參數:

import re
from bs4 import BeautifulSoup

html_doc = """#Query:[<div class="price">
<span class="price-currency">$</span>
<label for="low-price" hidden="">Low Price</label>
<input class="price-filter" data-val="true" data-val-number="The field LowPrice must be a number." data-val-required="The LowPrice field is required." id="low-price" name="SearchCriteria.LowPrice" placeholder="Min" type="text" value="0.00">
<span class="price-currency">$</span>
<label for="high-price" hidden="">Low Price</label>
<input class="price-filter" data-val="true" data-val-number="The field HighPrice must be a number." data-val-required="The HighPrice field is required." id="high-price" name="SearchCriteria.HighPrice" placeholder="Max" type="text" value="999999.00">
</input></input></div>, <div class="price">
$1,001.00                                    </div>]"""

soup = BeautifulSoup(html_doc, 'html.parser')
prices = soup.find_all("div", {"class": "price"}, text=re.compile('1,001.00'))

print(prices[0].text.strip())

輸出:

$1,001.00

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM