[英]Extract substring with regular expression, always None of re.match()
I would like to extract some information from a string by regex, but the result is always None.我想通过正则表达式从字符串中提取一些信息,但结果始终为 None。 The source code is as follows:
源代码如下:
line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
x = re.match(r'property=".+?"',line)
print(x)
I want to extract content and property tuples, how can I fix it?我想提取内容和属性元组,我该如何解决?
I would suggest something more suitable.我会建议更合适的东西。
Using beautifulsoup
:使用
beautifulsoup
:
from bs4 import BeautifulSoup
line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
soup = BeautifulSoup(line, 'lxml')
print("Content: {}".format(soup.meta["content"]))
print("Property: {}".format(soup.meta["property"]))
OUTPUT :输出:
Content: Allrecipes
Property: og:site_name
The answer from @DirtyBit is better than using regex. @DirtyBit 的答案比使用正则表达式要好。 But, if you still want to use regex, it may helps ( RegexDemo ):
但是,如果您仍然想使用正则表达式,它可能会有所帮助( RegexDemo ):
line = '<meta content=\"Allrecipes\" property=\"og:site_name\"/>'
regex = re.search("content=\\\"(?P<content>.*)\\\".*property=\\\"(?P<prop>.*)\\\"\/>",line)
print (regex.groups())
Output:输出:
('Allrecipes', 'og:site_name')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.