简体   繁体   English

如何使用 Python 中的正则表达式从同一个字符串中提取多个值?

[英]How to extract multiple values from the same String with Regex in Python?

I am currently trying to scrape some data from a webpage.我目前正在尝试从网页中抓取一些数据。 The data I need is within the <meta> tag of the html source.我需要的数据在 html 源的<meta>标签内。 Scraping the data and saving it to a String with BeautifulSoup is no problem.使用 BeautifulSoup 抓取数据并将其保存到字符串中是没有问题的。

The String contains 2 numbers I want to extract.该字符串包含我要提取的 2 个数字。 Each of those numbers (review scores from 1-100) should be assigned to a distinct variable for further processing.这些数字中的每一个(从 1 到 100 的评论分数)都应该分配给一个不同的变量以供进一步处理。

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

The first value is 79/100 and the second is 86/100 , but I only need 79 and 86 .第一个值是79/100 ,第二个值是86/100 ,但我只需要7986 So far I have created a regex search to find those values and then .replace("/100") to clean things up.到目前为止,我已经创建了一个正则表达式搜索来查找这些值,然后使用.replace("/100")来清理。

But with my code, I only get the value for the first regex search match, which is 79 .但是使用我的代码,我只能获得第一个正则表达式搜索匹配的值,即79 I tried getting the second value with m.group(1) but it doesn't work.我尝试使用m.group(1)获取第二个值,但它不起作用。

What am I missing ?我错过了什么?

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

m = re.search("../100", test_str)
if m:
    found = m.group(0).replace("/100","")
    print found

    # output -> 79

Thanks for your help.感谢您的帮助。

Best regards!此致!

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"    
m =  re.findall('(\d+(?=\/100))', test_str)
# m = ['79', '86']

I changed .. with /d+ so you can search for either 1 digit or 2我用/d+改变了..所以你可以搜索 1 位或 2

I also use a positive lookahead (?=...) , so the .replace becomes unnecessary我也使用积极的前瞻(?=...) ,所以.replace变得不必要

Example at Regex101 Regex101 中的示例

I dont know why most people are not suggesting back references to a named group.我不知道为什么大多数人不建议对命名组进行反向引用。

You can do something like below, syntax might not be perfect.您可以执行以下操作,语法可能并不完美。

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

pattern = "^<meta content=\"Overall Rating: (?P<rating>.*?) ... Some Info ... (?P<score>.*?)$"

match = re.match(pattern, test_str)

match.group('rating')
match.group('score')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM