[英]How to match the following with regex in python?
Assume I have the following string: 假设我有以下字符串:
string = "** Hunger is the physical sensation of desiring food.
<br> Your Hunger Level: Very Hungery<br> Food You Crave: Tomato<br/><br/>"
I want to be able to extract out "Your Hunger" and "Tomato". 我希望能够提取“您的饥饿”和“西红柿”。 Assume that regardless of what special characters are inserted, I know for a fact that "Your Hunger Level:" and "Food You Crave" will always be constant.
假定无论插入什么特殊字符,我都知道“您的饥饿程度:”和“您渴望的食物”将始终不变。
"Your Hunger Level:" could be: "Very Hungry", "Hungry", "Not So Hungry"
"Food You Crave:" could be: "Tomato", "Rice and Beans", "Corn Soup"
How do I use a regular expression to match this? 如何使用正则表达式来匹配它? I tried the following, but am not getting any luck...
我尝试了以下方法,但是没有任何运气...
m = re.match('(.*)([ \t]+)?Your Hunger Level:([ \t]+)?(?P<hungerlevel>.*)(.*)Food You Crave:([ \t]+)?(?P<foodcraving>.*).*', string)
NOTE: The string appears to have a lot of escape characters indicated below: 注意:该字符串似乎有很多转义符,如下所示:
string = "** Hunger is the physical sensation of desiring food. <br>\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\tYour Hunger Level:
Very Hungry \n\t\t\t\t\t\t\t\t<br>\n\t\t\t\t\t\t\t\tFood You Crave: Tomato \n\t\t\t\t\t\t</br>"
I'd go for: 我会去:
print [map(str.strip, line.split(':')) for line in re.split('<.*?>', string) if ':' in line]
# [['Your Hunger Level', 'Very Hungery'], ['Food You Crave', 'Tomato']]
Or, you could make it a dict
: 或者,您可以将其
dict
:
lookup = dict(map(str.strip, line.split(':')) for line in re.split('<.*?>', text) if ':' in line)
print lookup['Your Hunger Level']
# 'Very Hungry'
I definitely agree with using any sort of parser, but the following seems to work. 我绝对同意使用任何类型的解析器,但以下方法似乎可行。 It simply starts after your target word and goes until it hits a
<
(I do not endorse it for the record, but hopefully it works :) ): 它只是在您的目标词之后开始,一直到它击中
<
(我不认可它为记录,但希望它能起作用:)):
In [28]: import re
In [29]: s = """** Hunger is the physical sensation of desiring food.
<br> Your Hunger Level: Very Hungery<br> Food You Crave: Tomato<br/><br/>"""
In [31]: m = re.search(r'Your Hunger Level:([^<]*)<br>.*Food You Crave:([^<]*)', s)
In [32]: m.group(1).strip()
Out[32]: 'Very Hungery'
In [33]: m.group(2).strip()
Out[33]: 'Tomato'
The strip()
is to trim whitespace - not sure what the setup of your string is, but this is conservative so that it handles cases where there is no space between the colon and the text. strip()
是为了修剪空格-不确定字符串的设置是什么,但这是保守的,因此它可以处理冒号和文本之间没有空格的情况。 Also, I would recommend not using Python keywords as variable names ( string
, in this case) - it will make things easier for you in the long run :) 另外,我建议不要将Python关键字用作变量名(在这种情况下为
string
)-从长远来看,它将使您更轻松:)
<br>
tags. <br>
标签。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.