美丽汤和 td 元素提取的问题

Question

另一个beautifulsoup提取问题

很抱歉，我知道这些问题被问了很多，但我迷路了，我不太了解一些东西。

首先，这是我从网站中提取数据的基本代码：

import requests
import csv
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as BS

my_protein_list = ["ArthCp002"]
for protein in my_protein_list:
    text = requests.get('https://www.genome.jp/dbget-bin/www_bget?ath:' + protein).text
    soup = BS(text,'html.parser')
    AGI = soup.find("td", {"class":"td11"})
print(AGI)

我想从网站上获取 TAIR 值。 我的第一个问题是，为什么上面的代码只输出以下

<td class="td11" style="border-color:#000; border-width: 1px 1px 0px 1px; border-style: solid"><div style="width:555px;overflow-x:auto;overflow-y:hidden">psbA<br/>
</div></td>

为什么它不提供 td class 中的所有内容？

此外，我需要的 TAIR 在元素编号 3 中找到。因此，当我将表格元素添加到我的代码中时，它不会返回任何内容。 例如，我在打印之前添加了这段代码：

AGI = AGI.table

为什么它不从表格元素中抓取数据？ 有人可以帮我理解吗？ 干杯。

Answer 1

您的目标是错误的元素。 TAIR 是一个锚。

在这里，试试这个：

import requests
from bs4 import BeautifulSoup

url = "https://www.genome.jp/dbget-bin/www_bget?ath:arthcp002"
anchors = BeautifulSoup(requests.get(url).content, "html.parser").find_all("a", href=True)

for anchor in anchors:
    if "Tair" in anchor["href"]:
        print(anchor["href"], anchor.getText())

Output：

http://arabidopsis.org/servlets/TairObject?type=locus&name=ATCG00020 ATCG00020

美丽汤和 td 元素提取的问题

问题描述

1 个解决方案

解决方案1
1 2020-12-09 15:37:11

美丽汤和 td 元素提取的问题

问题描述

1 个解决方案

解决方案1 1 2020-12-09 15:37:11

解决方案1
1 2020-12-09 15:37:11