使用REGEX在Python中的行之间匹配元素

Question

我正在寻找使用REGEX从购物网站中提取数量的方法。 在下面的示例中，我想得到“ 12.5公斤”。 但是，第一个跨度内的数量并不总是以千克为单位； 可能是磅，盎司等。

        <td class="size-price last first" colspan="4">
                    <span>12.5 kilograms </span>
            <span> <span class="strike">$619.06</span> <span class="price">$523.91</span>
                    </span>
                </td>

上面的代码只是使用BeautifulSoup实际提取的内容的一小部分。 无论页面是什么，数量始终在一个范围内，并且在之后

<td class="size-price last first" colspan="4">

我过去使用过REGEX，但距离专家还很远。 我想知道如何在不同行之间匹配元素。 在这种情况下

<td class="size-price last first" colspan="4">

和

<span> <span class="strike">

Answer 1

避免使用正则表达式解析HTML。 使用该工具进行工作，使用HTML解析器（例如BeautifulSoup -它功能强大，易于使用，并且可以完美地处理您的情况：

from bs4 import BeautifulSoup


data = """
<td class="size-price last first" colspan="4">
                    <span>12.5 kilograms </span>
            <span> <span class="strike">$619.06</span> <span class="price">$523.91</span>
                    </span>
                </td>"""
soup = BeautifulSoup(data)

print soup.td.span.text

打印：

12.5 kilograms

或者，如果td是更大结构的一部分，则按类查找它，并从中获取第一个span的文本：

print soup.find('td', {'class': 'size-price'}).span.text

UPD（处理多个结果）：

print [td.span.text for td in soup.find_all('td', {'class': 'size-price'})]

希望能有所帮助。

使用REGEX在Python中的行之间匹配元素

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-03-25 03:40:24

使用REGEX在Python中的行之间匹配元素

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-03-25 03:40:24

解决方案1
1 已采纳 2014-03-25 03:40:24