如何最好地在python的html字符串中提取以下内容？

Question

Assuming I have the following string with line breaks: 假设我有以下带有换行符的字符串：

<table>
<tr>
<td valign="top"><a href="ABext.html">House Exterior:</a></td><td>Round</td>
</tr>
<tr>
<td>EF</td><td><a href="AB.html">House AB</a></td></tr>
<tr>
<td valign="top">Settlement Date:</td>
<td valign="top">2/3/2013</td>
</tr>
</table>

What is the best way to create a simple python dictionary with the following: 使用以下命令创建简单的python字典的最佳方法是什么：

I want to extract the Settlement Date into a dict or some kind of regex match. 我想将结算日期提取为字典或某种正则表达式匹配项。 What is the best way to do this? 做这个的最好方式是什么？

NOTE: A sample in some utility is fine, but am looking for a better way than to have a variable that has contains text like this and having to go through a lot of .next.next.next.next.next until I finally get to settlement date, which is why I posted this question in the first place. 注意：某个实用程序中的示例很好，但是我正在寻找一种更好的方法，而不是拥有一个包含这样的文本并且必须经过大量.next.next.next.next.next的变量，直到我最终得到为止。到结算日期，这就是为什么我首先发布此问题的原因。

Answer 1

If the data is highly regular, then a regex isn't a bad choice. 如果数据是高度规则的，则正则表达式不是一个坏选择。 Here's a straight-forward approach: 这是一个简单的方法：

regex = re.compile(r'>Settlement Date:</td>[^>]*>([^<]*)')
match = regex.search(data)
print match.group(1)

如何最好地在python的html字符串中提取以下内容？

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-04-01 01:48:27

如何最好地在python的html字符串中提取以下内容？

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-04-01 01:48:27

解决方案1
1 已采纳 2014-04-01 01:48:27