re.findall 只返回最后一场比赛

Question

I have the following HTML:我有以下 HTML：

<tr>
<td style="text-align: left;" colspan="1">10:10</td>
<td style="text-align: left;" colspan="1">This is a description.</td>
</tr>
<tr>
<td colspan="1">10:30</td>
<td colspan="1">This is another description.</td>
</tr>

I'm wanting to return multiple matches, each consisting of two groups: group 1 which is the timestamp, and group 2 which is the description.我想返回多个匹配项，每个匹配项由两组组成：组 1 是时间戳，组 2 是描述。

When I run当我跑

re.findall(r'<td.*>(\d\d:\d\d)<\/td><td.*>(.*?)<\/td>', HTML)

I'm only getting the last match:我只得到最后一场比赛：

[('10:30', 'This is another description.')]

Can anyone tell me what's wrong with my regex?谁能告诉我我的正则表达式有什么问题？

Answer 1

Your first .* is matching as many characters as it can, so you get exactly one match that's everything from the first <td to the last </td> .您的第一个.*匹配尽可能多的字符，因此您会得到一个匹配，即从第一个<td到最后一个</td> 。 Using [^>]* instead of .* for the first two will make it only match what's inside one tag.前两个使用[^>]*而不是.*会使它只匹配一个标签内的内容。

re.findall 只返回最后一场比赛

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-02-09 00:00:12

re.findall 只返回最后一场比赛

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-02-09 00:00:12

解决方案1
0 已采纳 2020-02-09 00:00:12