如何使用 Python 中的正则表达式从同一个字符串中提取多个值？

Question

I am currently trying to scrape some data from a webpage.我目前正在尝试从网页中抓取一些数据。 The data I need is within the <meta> tag of the html source.我需要的数据在 html 源的<meta>标签内。 Scraping the data and saving it to a String with BeautifulSoup is no problem.使用 BeautifulSoup 抓取数据并将其保存到字符串中是没有问题的。

The String contains 2 numbers I want to extract.该字符串包含我要提取的 2 个数字。 Each of those numbers (review scores from 1-100) should be assigned to a distinct variable for further processing.这些数字中的每一个（从 1 到 100 的评论分数）都应该分配给一个不同的变量以供进一步处理。

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

The first value is 79/100 and the second is 86/100 , but I only need 79 and 86 .第一个值是79/100 ，第二个值是86/100 ，但我只需要79和86 。 So far I have created a regex search to find those values and then .replace("/100") to clean things up.到目前为止，我已经创建了一个正则表达式搜索来查找这些值，然后使用.replace("/100")来清理。

But with my code, I only get the value for the first regex search match, which is 79 .但是使用我的代码，我只能获得第一个正则表达式搜索匹配的值，即79 。 I tried getting the second value with m.group(1) but it doesn't work.我尝试使用m.group(1)获取第二个值，但它不起作用。

What am I missing ?我错过了什么？

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

m = re.search("../100", test_str)
if m:
    found = m.group(0).replace("/100","")
    print found

    # output -> 79

Thanks for your help.感谢您的帮助。

Best regards!此致！

Answer 1

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"    
m =  re.findall('(\d+(?=\/100))', test_str)
# m = ['79', '86']

I changed .. with /d+ so you can search for either 1 digit or 2我用/d+改变了..所以你可以搜索 1 位或 2

I also use a positive lookahead (?=...) , so the .replace becomes unnecessary我也使用积极的前瞻(?=...) ，所以.replace变得不必要

Example at Regex101 Regex101 中的示例

Answer 2

I dont know why most people are not suggesting back references to a named group.我不知道为什么大多数人不建议对命名组进行反向引用。

You can do something like below, syntax might not be perfect.您可以执行以下操作，语法可能并不完美。

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

pattern = "^<meta content=\"Overall Rating: (?P<rating>.*?) ... Some Info ... (?P<score>.*?)$"

match = re.match(pattern, test_str)

match.group('rating')
match.group('score')

如何使用 Python 中的正则表达式从同一个字符串中提取多个值？

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-05-21 10:35:58

解决方案2
1 2020-11-23 02:31:37

如何使用 Python 中的正则表达式从同一个字符串中提取多个值？

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-05-21 10:35:58

解决方案2 1 2020-11-23 02:31:37

解决方案1
2 已采纳 2017-05-21 10:35:58

解决方案2
1 2020-11-23 02:31:37