将 HTML 表中提取的文本分配给变量以备后用 -- Beautiful Soup / Python 3.7

Question

我有以下代码可以完美地在 HTML 表源代码中动态搜索特定文本，并拉出找到特定文本的行的 nextSibling。

当前代码

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
           
# Find xxxxxxx (row-by-row) and split trailing zeros
row = soup.find_all('td', string="xxxxxxx")
for r in row:
        LE = r.nextSibling
        while LE.name != 'td' and LE is not None:
                LE = LE.nextSibling

我遇到的主要问题（这可能非常简单，而且我已经关注了这么久）是我需要将 nextSibling 分配给 LE 变量。

LE 的格式为“001234”，我需要去除前导零以将“1234”作为变量。

如果我将变量print(LE.text[2:6])为print(LE.text[2:6])结果是正确的。 在代码中实现为， LE = LE.nextSibling.text[2:6]不会产生任何结果。

我已经尝试了以下陈述，但都没有奏效，希望得到指导。

LE = LE.nextSibling.text[2:6]
&
LE = LE.text[2:6]

我需要在提取后将其分配给一个变量，以便稍后在我的脚本中使用该变量。 我提前感谢您的帮助！

编辑 --> 包含源代码：

<tr>
     <td class='label' nowrap title="xxxxxxx">TEXT TO FIND</td>
     <td class='attribute'>001234</td>
</tr>

Answer 1

您可以使用next_sibling两次，然后使用strip()删除0 ：

from bs4 import BeautifulSoup

html = """<tr>
     <td class='label' nowrap title="xxxx">TEXT TO FIND</td>
     <td class='attribute'>001234</td>
</tr>"""

soup = BeautifulSoup(html, "html.parser")

for tag in soup.select(".label"):
    le = ''.join([t.strip("0") for t in tag.next_sibling.next_sibling])
    print(tag.text)
    print(le)

输出：

TEXT TO FIND
1234

Answer 2

改变：

!=到==

 row = soup.find_all('td', string="xxxxxx")
            for r in row:
                LE = r.nextSibling
                    LE = LE.text[2:6]

将 HTML 表中提取的文本分配给变量以备后用 -- Beautiful Soup / Python 3.7

问题描述

2 个解决方案

解决方案1
1 2020-11-18 21:06:28

解决方案2
0 已采纳 2020-11-18 21:43:20

将 HTML 表中提取的文本分配给变量以备后用 -- Beautiful Soup / Python 3.7

问题描述

2 个解决方案

解决方案1 1 2020-11-18 21:06:28

解决方案2 0 已采纳 2020-11-18 21:43:20

解决方案1
1 2020-11-18 21:06:28

解决方案2
0 已采纳 2020-11-18 21:43:20