用正则表达式提取子串，总是没有 re.match()

Question

I would like to extract some information from a string by regex, but the result is always None.我想通过正则表达式从字符串中提取一些信息，但结果始终为 None。 The source code is as follows:源代码如下：

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
x = re.match(r'property=".+?"',line)
print(x)

I want to extract content and property tuples, how can I fix it?我想提取内容和属性元组，我该如何解决？

Answer 1

I would suggest something more suitable.我会建议更合适的东西。

Using beautifulsoup :使用beautifulsoup ：

from bs4 import BeautifulSoup

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
soup = BeautifulSoup(line, 'lxml')

print("Content: {}".format(soup.meta["content"]))
print("Property: {}".format(soup.meta["property"]))

OUTPUT :输出：

Content: Allrecipes
Property: og:site_name

Answer 2

The answer from @DirtyBit is better than using regex. @DirtyBit 的答案比使用正则表达式要好。 But, if you still want to use regex, it may helps ( RegexDemo ):但是，如果您仍然想使用正则表达式，它可能会有所帮助（ RegexDemo ）：

line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
regex = re.search("content=\\\"(?P<content>.*)\\\".*property=\\\"(?P<prop>.*)\\\"\/>",line)
print (regex.groups())

Output:输出：

('Allrecipes', 'og:site_name')

用正则表达式提取子串，总是没有 re.match()

问题描述

2 个解决方案

解决方案1
0 2019-03-26 08:04:56

解决方案2
0 已采纳 2019-03-26 08:08:25

用正则表达式提取子串，总是没有 re.match()

问题描述

2 个解决方案

解决方案1 0 2019-03-26 08:04:56

解决方案2 0 已采纳 2019-03-26 08:08:25

解决方案1
0 2019-03-26 08:04:56

解决方案2
0 已采纳 2019-03-26 08:08:25