簡體   English   中英

如何在Python中提取包含特定字符的字符串

[英]How to extract string that contains specific characters in Python

我試圖只提取一個包含 $ 字符的字符串。 基於我使用 BeautifulSoup 提取的輸出的輸入。

代碼

price = [m.split() for m in re.findall(r"\w+/$(?:\s+\w+/$)*", soup_content.find('blockquote', { "class": "postcontent restore" }).text)]

輸入

For Sale is my Tag Heuer Carrera Calibre 6 with box and papers and extras.
39mm
47 ish lug to lug
19mm in between lugs
Pretty thin but not sure exact height. Likely around 12mm (maybe less)
I've owned it for about 2 years. I absolutely love the case on this watch. It fits my wrist and sits better than any other watch I've ever owned. I'm selling because I need cash and other pieces have more sentimental value
I am the second owner, but the first barely wore it.
It comes with barely worn blue leather strap, extra suede strap that matches just about perfectly and I'll include a blue Barton Band Elite Silicone.
I also purchased an OEM bracelet that I personally think takes the watch to a new level. This model never came with a bracelet and it was several hundred $ to purchase after the fact.
The watch was worn in rotation and never dropped or knocked around.
The watch does have hairlines, but they nearly all superficial. A bit of time with a cape cod cloth would take care of a lot it them. The pics show the imperfections in at "worst" possible angle to show the nature of scratches.
The bracelet has a few desk diving marks, but all in all, the watch and bracelet are in very good shape.
Asking $2000 obo. PayPal shipped. CONUS.
It's a big hard to compare with others for sale as this one includes the bracelet.

輸出應該是這樣的。

2000

你不需要正則表達式。 相反,您可以遍歷行和每個單詞以檢查以'$'開頭並提取單詞:

[word[1:] for line in s.split('\n') for word in line.split() if word.startswith('$') and len(word) > 1]

其中s是您的段落。

輸出:

['2000']

我會做類似的事情(假設輸入是你上面寫的字符串)-

price_start = input.find('$')
price = input[price_start:].split(' ')[0]

如果像你說的那樣只有 1 次發生。

替代方案-您可以像這樣使用正則表達式-

price = re.findall('\S*\$\S*\d', input)[0]
price = price.replace('$', '')

由於這非常簡單,您不需要正則表達式解決方案,這應該足夠了:

words = text.split()
words_with_dollar = [word for word in words if '$' in word]
print(words_with_dollar)

>>> ['$', '$2000']

如果您不想單獨使用美元符號,可以添加如下過濾器:

words_with_dollar = [word for word in words if '$' in word and '$' != word]
print(words_with_dollar)

>>> ['$2000']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM