Python，漂亮的湯， <br> 標簽

Question

因此，我已經瀏覽了堆棧溢出，但是似乎找不到解決問題的答案。 如何在<br>標記后獲取文本（特定文本）？

這是我的代碼：

product_review_container = container.findAll("span",{"class":"search_review_summary"})
for product_review in product_review_container:
    prr = product_review.get('data-tooltip-html')
    print(prr)

這是輸出：

Very Positive<br>86% of the 1,013 user reviews for this game are positive.

我只想在該字符串中輸入86％，也僅需要1,013。 所以只有數字。 但是它不是一個整數，所以我不知道該怎么辦。

這是文本的來源：

   [<span class="search_review_summary positive" data-tooltip-html="Very Positive&lt;br&gt;86% of the 1,013 user reviews for this game are positive.">
</span>]

這是我從中獲取信息的鏈接： https : //store.steampowered.com/search/?specials=1&page=1

謝謝！

Answer 1

您需要在這里使用正則表達式！

import re

string = 'Very Positive<br>86% of the 1,013 user reviews for this game are positive.'
a = re.findall('(\d+%)|(\d+,\d+)',string)
print(a)

output: [('86%', ''), ('', '1,013')]
#Then a[0][0] will be 86% and a[1][1] will be 1,013

其中\\ d是字符串中的任何數字字符，而+是至少包含1個或多個數字。

如果您需要更具體的正則表達式，則可以在https://regex101.com中嘗試

Answer 2

有一種非正則表達式的方法； 承認有些令人費解，但仍然很有趣：

首先，我們借用（並修改）這個不錯的功能：

def split_and_keep(s, sep):
         if not s: return [''] # consistent with string.split()
         p=chr(ord(max(s))+1)
         return s.replace(sep, sep+p).split(p)

然后，我們執行一些標准步驟：

html = """
  [<span class="search_review_summary positive" data-tooltip-html="Very    Positive&lt;br&gt;86% of the 1,013 user reviews for this game are positive."></span>]
  """

from bs4 import BeautifulSoup as bs4
soup = bs4(html, 'html.parser')
info = soup.select('span')[0].get("data-tooltip-html")
print(info)

到目前為止的輸出是：

Very Positive<br>86% of the 1,013 user reviews for this game are positive.

接下來我們去：

data = ''.join(c for c in info if (c.isdigit()) or c == '%')
print(data)

現在輸出會好一些：

86%1013

快好了; : 現在是：

split_and_keep(data, '%')

最終輸出：

['86%', '1013']

Python，漂亮的湯， <br> 標簽

問題描述

2 個解決方案

解決方案1
2 已采納 2019-03-03 21:39:33

解決方案2
1 2019-03-03 23:47:46

Python，漂亮的湯， <br> 標簽

問題描述

2 個解決方案

解決方案1 2 已采納 2019-03-03 21:39:33

解決方案2 1 2019-03-03 23:47:46

解決方案1
2 已采納 2019-03-03 21:39:33

解決方案2
1 2019-03-03 23:47:46