Web 刮桌使用 python 美汤

Question

I am trying to get the information from a webpage table.我正在尝试从网页表中获取信息。

https://grs.icarda.org/accessions/?IG=46860 https://grs.icarda.org/accessions/?IG=46860

I want 'collecting information' from the second table.我想从第二个表中“收集信息”。 But there are no tags or ids to easily parse the data.但是没有标签或 ID 可以轻松解析数据。

table = soup.find('td', colspan='9')
table_data = soup.find('td', {'width':'150px', 'height':'26px'})

Here's the HTML这是 HTML

 <tbody><tr><td colspan="9" style=" background-color: #FFFFFF; font-weight: bold; height:33px;">Collecting information</td></tr> <tr style="background-color:#d7d4d4; height:26px;"><td style="vertical-align:middle;">Mission code:</td><td style="width:5px;"></td><td style="vertical-align:middle;">TUR79-2</td> <td width="20"></td></tr> <tr><td style="width:250px; height:26px;">Mission title:</td><td width="5"></td><td>MJ Metzger, S.Jana (USDA report)</td></tr> <tr style="background-color:#d7d4d4; height:26px;"><td style="vertical-align:middle;">Country:</td><td style="width:5px;"></td><td colspan="2" style="vertical-align:middle;"><img style="vertical-align:middle; width:24px; height:24px;" src="../images/flags/TUR.png"><span style="vertical-align:middle;"> &nbsp;Turkey</span></td></tr><tr> </tr><tr><td style="width:150px; height:26px;">Site Code:</td><td width="5"></td><td colspan="2">TUR79-2:12</td></tr><tr style="background-color:#d7d4d4; height:26px;"><td style="width:150px; height:26px;">Collectors:</td><td width="5"></td><td colspan="2">JA Hoffmann - M. Kanbertay - MJ Metzger - H. Sencer</td></tr><tr style=" height:26px;"><td style="width:150px; height:26px;">Collect Date:</td><td width="5"></td><td colspan="2">1979/08/09</td></tr> <tr style="background-color:#d7d4d4;"><td style=" width:150px; height:26px;">Collector's number:</td><td width="5"></td><td colspan="2">79TK012-057</td></tr><tr style=" height:26px;"><td style="width:150px; height:26px;">Admin 1:</td><td width="5"></td><td colspan="2">Malatya Province</td></tr><tr style=" background-color:#d7d4d4;height:26px;"><td style="width:150px; height:26px;">Admin 2:</td><td width="5"></td><td colspan="2"></td></tr><tr style="height:26px;"><td style="width:150px; height:26px;">Collecting site:</td><td width="5"></td><td colspan="2">5 km S of Darende</td></tr> </tbody>

Answer 1

You could use pandas which has read_html(), which returns a list of dataframes for each table on the page.您可以使用具有 read_html() 的 pandas，它返回页面上每个表的数据帧列表。 That table is the 3rd one (index 2) so this code could help you.该表是第三个（索引 2），因此此代码可以帮助您。 After getting the table I put the two columns into a dict for you:得到表格后，我将两列放入一个字典中：

import pandas as pd

df = pd.read_html('https://grs.icarda.org/accessions/?IG=46860')[2] #3rd table

col1 = df[0]
col2 = df[2]

zipped = zip(col1,col2)

output = {}
for x,y in zipped:
    output[x] = y

print(output)

Web 刮桌使用 python 美汤

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-01-16 21:48:58

Web 刮桌使用 python 美汤

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-01-16 21:48:58

解决方案1
1 已采纳 2022-01-16 21:48:58