[英]How to iterate through a list web scraped table column and return one result for each item?
我有一個python代碼,可在網絡上抓取正確的數據,但guest虛擬機列中包含多個字符串,並且目前僅通過一個字符串。 那么,如何遍歷該列單元格中的列表,並將3個guest作為單獨的列返回給希望的guest1,guest2,guest3? 謝謝
import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
df = pd.DataFrame(columns=(['NoInSeason', 'Guests', 'Winner', 'OriginalAirDate']))
page = requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
table_rows = table.find_all("tr")
for tr in table_rows:
td = tr.find_all("td")
if len(td) == 5:
NoInSeason = td[0].find(text=True)
Guests = td[2].find_all(text=True)
Winner = td[3].find(text=True)
OriginalAirDate = td[4].find(text=True)
if len(Guests) == 3:
Guest1 = Guests[0]
Guest2 = Guests[1]
Guest3 = Guests[2]
df = df.append({'NoInSeason': NoInSeason, 'Guest1' : Guest1, 'Guest2' : Guest2, 'Guest3' : Guest3, 'Winner': Winner, 'OriginalAirDate' : OriginalAirDate}, ignore_index=True)
df.to_csv("output.csv")
print(df)
這是您要找的東西嗎?
df = pd.DataFrame(columns=(['NoInSeason', 'Guest 1',
'Guest 2', 'Guest 3', 'Winner', 'OriginalAirDate']))
page =
requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
table_rows = table.find_all("tr")
for tr in table_rows:
td = tr.find_all("td")
if len(td) == 5:
NoInSeason = td[0].find(text=True)
Guests = td[2].find_all(text=True)
Winner = td[3].find(text=True)
OriginalAirDate = td[4].find(text=True)
print(Guests)
try:
df = df.append({'NoInSeason': NoInSeason, 'Guest 1' : Guests[0], 'Guest 2' : Guests[1], 'Guest 3' : Guests[2], 'Winner': Winner, 'OriginalAirDate' : OriginalAirDate}, ignore_index=True)
except IndexError as index_error:
continue
print(df)
編輯:我看到您更改了代碼,現在可以了嗎? 而且,在DataFrame中包含Guest1,Guest2和Guest3列是否會更好,這樣您就不會得到充滿NaN的“ Guests”列?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.