在 Jupyter Notebook 中使用 BeautifulSoup 刮表

Question

我正在嘗試使用 Beautifulsoup 打印以列表格式給出的嬰兒姓名表。

google-python-exercises/google-python-exercises/babynames/baby1990.html （HTML頁面為實際網址截圖）

使用 urllib.request 獲取表格並使用 BeautifulSoup 對其進行解析后，我能夠在表格的每一行中打印數據，但是我得到了錯誤的輸出。

這是我的代碼：

right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr') 

for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)

它應該打印 1 個包含行中所有數據的列表，但是，我得到了許多列表，每個新列表都以少一條記錄開頭

有點像這樣：

['997', 'Eliezer', 'Asha', '998', 'Jory', 'Jada', '999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['998', 'Jory', 'Jada', '999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']

如何只打印一個列表？

Answer 1

我會嘗試使用熊貓和索引到表的結果列表中來獲取你想要的表

import pandas as pd

tables = pd.read_html('yourURL')

print(tables[1]) # for example; change index as required

Answer 2

你的循環正在創建你的行列表，然后打印它，然后進入下一次迭代，在那里它創建一個行列表（覆蓋你以前的），然后打印它等等。

不確定為什么要將所有行都放入一個列表中，但是要獲得一個最終列表，您需要在每次迭代時將每一行列表附加到一個最終列表中。

你真的是說你想要一個行列表的列表嗎？

right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr') 


result_list = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    result_list = result_list + row


print(result_list)

如果你真的想要一個你的行列表，那么使用這個：

right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr') 


result_list = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    result_list.append(row)


print(result_list)

但老實說，我會像 QHarr 建議的那樣使用 pandas 和 .read_html() 。

right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr') 


result_list = []
for tr in table_rows:
    td = tr.find_all('td')
    for data in td:
        print (td.text)

在 Jupyter Notebook 中使用 BeautifulSoup 刮表

問題描述

2 個解決方案

解決方案1
2 2019-03-02 19:25:08

解決方案2
0 2019-03-02 19:20:52

在 Jupyter Notebook 中使用 BeautifulSoup 刮表

問題描述

2 個解決方案

解決方案1 2 2019-03-02 19:25:08

解決方案2 0 2019-03-02 19:20:52

解決方案1
2 2019-03-02 19:25:08

解決方案2
0 2019-03-02 19:20:52