简体   繁体   English

如何遍历列表Web抓取的表格列并为每个项目返回一个结果?

[英]How to iterate through a list web scraped table column and return one result for each item?

I have a python code that web scrape the correct data but the guests column has more than one string in and is currently only pulling through one. 我有一个python代码,可在网络上抓取正确的数据,但guest虚拟机列中包含多个字符串,并且目前仅通过一个字符串。 So how do I iterate through the list within that column cell and return the 3 guests as a separate columns for each hopefully guest1, guest2, guest3? 那么,如何遍历该列单元格中的列表,并将3个guest作为单独的列返回给希望的guest1,guest2,guest3? Thanks 谢谢

import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np

df = pd.DataFrame(columns=(['NoInSeason', 'Guests', 'Winner', 'OriginalAirDate']))
page = requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
    table_rows = table.find_all("tr")
    for tr in table_rows:
        td = tr.find_all("td")
        if len(td) == 5:
            NoInSeason = td[0].find(text=True)
            Guests = td[2].find_all(text=True)
            Winner  = td[3].find(text=True)
            OriginalAirDate = td[4].find(text=True) 
            if len(Guests) == 3:
                Guest1 = Guests[0]
                Guest2 = Guests[1]
                Guest3 = Guests[2]
                df = df.append({'NoInSeason': NoInSeason, 'Guest1' : Guest1, 'Guest2' : Guest2, 'Guest3' : Guest3, 'Winner': Winner, 'OriginalAirDate' : OriginalAirDate}, ignore_index=True)
df.to_csv("output.csv")
print(df)

Is this what you were looking for? 这是您要找的东西吗?

df = pd.DataFrame(columns=(['NoInSeason', 'Guest 1', 
'Guest 2', 'Guest 3', 'Winner', 'OriginalAirDate']))
page = 
  requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
    table_rows = table.find_all("tr")
    for tr in table_rows:
        td = tr.find_all("td")
        if len(td) == 5:
            NoInSeason = td[0].find(text=True)
            Guests = td[2].find_all(text=True)
            Winner  = td[3].find(text=True)
            OriginalAirDate = td[4].find(text=True)
            print(Guests)
            try:
                df = df.append({'NoInSeason': NoInSeason, 'Guest 1' : Guests[0], 'Guest 2' : Guests[1], 'Guest 3' : Guests[2], 'Winner': Winner, 'OriginalAirDate' : OriginalAirDate}, ignore_index=True)
            except IndexError as index_error:
                continue
print(df)

Edit: I see you changed your code, does it now work? 编辑:我看到您更改了代码,现在可以了吗? And would it not work better including the Guest1, Guest2, and Guest3 columns in the DataFrame so that you don't get a 'Guests' column full of NaN? 而且,在DataFrame中包含Guest1,Guest2和Guest3列是否会更好,这样您就不会得到充满NaN的“ Guests”列?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何遍历字符串列表并打印每个项目? - How do I iterate through a list of strings and print each item? 如何遍历每一列并转换为字典,其中一列是列表? - How to iterate through each column and convert to dictionary, where one of the columns is a list? 迭代一个列表,为每个项目分配一个变量并返回它 - Iterate a list, assign a variable to each item and return it 如何遍历文件路径名列表并删除每个路径名? - How to iterate through a list of file path names and delete each one? 如何通过函数运行日期字符串列表并将每个项目的结果作为一个串联字符串返回? - How can I run a list of date strings through a function and return the results of each item as one concatonated string? 遍历列表并返回一项 - Iterate through list and going back one item 如何遍历 df 列(其中每一行都是一个列表),在不同的列表中查找元素? - How do I iterate through a df column (where each row is a list), looking for elements in a different list? 如何在列表中连续迭代和返回每个值? - How to iterate and return each value continuously in a list? 如何遍历列表中的每个项目,无限制地操作每个项目并且列表是用户输入的? Python - How do I Iterate through every item on a list, manipulate each item without a limit and that the list is user inputted? Python 如何使用 Selenium 遍历 Web 表? - How to iterate through web table with Selenium?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM