如何在python中使用正则表达式匹配以下内容？

Question

Assume I have the following string: 假设我有以下字符串：

string = "** Hunger is the physical sensation of desiring food.                                      

<br>         Your Hunger Level: Very Hungery<br> Food You Crave: Tomato<br/><br/>"

I want to be able to extract out "Your Hunger" and "Tomato". 我希望能够提取“您的饥饿”和“西红柿”。 Assume that regardless of what special characters are inserted, I know for a fact that "Your Hunger Level:" and "Food You Crave" will always be constant. 假定无论插入什么特殊字符，我都知道“您的饥饿程度：”和“您渴望的食物”将始终不变。

"Your Hunger Level:" could be: "Very Hungry", "Hungry", "Not So Hungry"
"Food You Crave:" could be: "Tomato", "Rice and Beans", "Corn Soup"

How do I use a regular expression to match this? 如何使用正则表达式来匹配它？ I tried the following, but am not getting any luck... 我尝试了以下方法，但是没有任何运气...

m = re.match('(.*)([ \t]+)?Your Hunger Level:([ \t]+)?(?P<hungerlevel>.*)(.*)Food You Crave:([ \t]+)?(?P<foodcraving>.*).*', string)

NOTE: The string appears to have a lot of escape characters indicated below: 注意：该字符串似乎有很多转义符，如下所示：

string = "** Hunger is the physical sensation of desiring food. <br>\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\tYour Hunger Level:
Very Hungry \n\t\t\t\t\t\t\t\t<br>\n\t\t\t\t\t\t\t\tFood You Crave: Tomato \n\t\t\t\t\t\t</br>"

Answer 1

I'd go for: 我会去：

print [map(str.strip, line.split(':')) for line in re.split('<.*?>', string) if ':' in line]
# [['Your Hunger Level', 'Very Hungery'], ['Food You Crave', 'Tomato']]

Or, you could make it a dict : 或者，您可以将其dict ：

lookup = dict(map(str.strip, line.split(':')) for line in re.split('<.*?>', text) if ':' in line)
print lookup['Your Hunger Level']
# 'Very Hungry'

Answer 2

I definitely agree with using any sort of parser, but the following seems to work. 我绝对同意使用任何类型的解析器，但以下方法似乎可行。 It simply starts after your target word and goes until it hits a < (I do not endorse it for the record, but hopefully it works :) ): 它只是在您的目标词之后开始，一直到它击中< （我不认可它为记录，但希望它能起作用:)）：

In [28]: import re

In [29]: s = """** Hunger is the physical sensation of desiring food.
<br>         Your Hunger Level: Very Hungery<br> Food You Crave: Tomato<br/><br/>"""

In [31]: m = re.search(r'Your Hunger Level:([^<]*)<br>.*Food You Crave:([^<]*)', s)

In [32]: m.group(1).strip()
Out[32]: 'Very Hungery'

In [33]: m.group(2).strip()
Out[33]: 'Tomato'

The strip() is to trim whitespace - not sure what the setup of your string is, but this is conservative so that it handles cases where there is no space between the colon and the text. strip()是为了修剪空格-不确定字符串的设置是什么，但这是保守的，因此它可以处理冒号和文本之间没有空格的情况。 Also, I would recommend not using Python keywords as variable names ( string , in this case) - it will make things easier for you in the long run :) 另外，我建议不要将Python关键字用作变量名（在这种情况下为string ）-从长远来看，它将使您更轻松：)

Answer 3

First, parse the HTML with a parser. 首先，使用解析器解析HTML。 There are many at your disposal, eg beautiful soup, lxml. 您可以随意使用许多东西，例如漂亮的汤，lxml。
Second, search the document for <br> tags. 其次，在文档中搜索<br>标签。
Third, do a search over the text of the tags for the text that you want, and return that tag. 第三，在标签的文本中搜索所需的文本，然后返回该标签。

如何在python中使用正则表达式匹配以下内容？

问题描述

3 个解决方案

解决方案1
3 已采纳 2012-10-29 20:46:12

解决方案2
2 2012-10-29 20:38:54

解决方案3
0 2012-10-29 20:33:44

如何在python中使用正则表达式匹配以下内容？

问题描述

3 个解决方案

解决方案1 3 已采纳 2012-10-29 20:46:12

解决方案2 2 2012-10-29 20:38:54

解决方案3 0 2012-10-29 20:33:44

解决方案1
3 已采纳 2012-10-29 20:46:12

解决方案2
2 2012-10-29 20:38:54

解决方案3
0 2012-10-29 20:33:44