简体   繁体   English

将数据放入 pandas dataframe 时遇到问题

[英]Having trouble putting data into a pandas dataframe

I am new to coding, so take it easy on me.我是编码新手,所以请放轻松。 I recently started a pet project which scrapes data from a table and will create a csv of the data for me, I believe I have successfully pulled the data, but trying to put it into a dataframe returns the error "Shape of passed values is (31719, 1), indices imply (31719. 23)", I have tried looking at the length of my headers and my rows and those numbers are correct.我最近开始了一个宠物项目,它从表中抓取数据并将为我创建数据的 csv,我相信我已经成功提取数据,但试图将其放入 dataframe 返回错误“传递值的形状是( 31719, 1),索引暗示 (31719. 23)",我尝试查看标题和行的长度,这些数字是正确的。 but when I try to put it into a dataframe it appears that it is only pulling one column into the dataframe, Again, I am very new to all of this but would appreciate any help!但是当我尝试将它放入 dataframe 时,它似乎只是将一列拉入 dataframe,同样,我对这一切都很陌生,但希望有任何帮助! Code below下面的代码

from bs4 import BeautifulSoup
from pandas.core.frame import DataFrame
import requests
import pandas as pd
url = 'https://www.fangraphs.com/leaders.aspx? pos=all&stats=bat&lg=all&qual=0&type=8&season=2018&month=0&season1=2018&ind=0&page=1_1500'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
#pulling table from HTML
Table1 = soup.find('table', id = 'LeaderBoard1_dg1_ctl00')
#finding and filling table columns
headers = []
for i in Table1.find_all('th'):
    title = i.text
    headers.append(title)
#finding and filling table rows
rows = []
for j in Table1.find_all('td'):
    data = j.text
    rows.append(data)
#filling dataframe
df = pd.DataFrame(rows, columns = headers)
#show dataframe
print(df)

You are creating a dataframe with 692 rows with 23 columns as a new dataframe.您正在创建一个具有 692 行 23 列的 dataframe 作为新的 dataframe。 However looking at the rows array, you only have 1 dimensional array so shape of passed values is not matching with indices.但是查看行数组,您只有一维数组,因此传递值的形状与索引不匹配。 You are passing 692 x 1 to a dataframe with 692 x 23 which won't work.您将 692 x 1 传递给 692 x 23 的 dataframe 将不起作用。

If you want to create with the data you have, you should just use:如果你想用你拥有的数据创建,你应该只使用:

df=pd.DataFrame(rows, columns=headers[1:2])

Alternativly you can achieve your goal directly by using pandas.read_html that processe the data by BeautifulSoup for you:或者,您可以使用pandas.read_html直接实现您的目标,该文件由 BeautifulSoup 为您处理数据:

pd.read_html(url, attrs={'id':'LeaderBoard1_dg1_ctl00'}, header=[1])[0].iloc[:-1]
  • attrs={'id':'LeaderBoard1_dg1_ctl00'} selects table by id attrs={'id':'LeaderBoard1_dg1_ctl00'}按id选择表

  • header=[1] adjusts the header cause there are multiple headers header=[1]调整 header 导致有多个标题

  • .iloc[:-1] removes the table footer with pagination .iloc[:-1]使用分页删除表格页脚

Example例子

import pandas as pd

pd.read_html('https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2018&month=0&season1=2018&ind=0&page=1_1500',
            attrs={'id':'LeaderBoard1_dg1_ctl00'},
            header=[1])[0]\
            .iloc[:-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM