I am new to coding, so take it easy on me. I recently started a pet project which scrapes data from a table and will create a csv of the data for me, I believe I have successfully pulled the data, but trying to put it into a dataframe returns the error "Shape of passed values is (31719, 1), indices imply (31719. 23)", I have tried looking at the length of my headers and my rows and those numbers are correct. but when I try to put it into a dataframe it appears that it is only pulling one column into the dataframe, Again, I am very new to all of this but would appreciate any help! Code below
from bs4 import BeautifulSoup
from pandas.core.frame import DataFrame
import requests
import pandas as pd
url = 'https://www.fangraphs.com/leaders.aspx? pos=all&stats=bat&lg=all&qual=0&type=8&season=2018&month=0&season1=2018&ind=0&page=1_1500'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
#pulling table from HTML
Table1 = soup.find('table', id = 'LeaderBoard1_dg1_ctl00')
#finding and filling table columns
headers = []
for i in Table1.find_all('th'):
title = i.text
headers.append(title)
#finding and filling table rows
rows = []
for j in Table1.find_all('td'):
data = j.text
rows.append(data)
#filling dataframe
df = pd.DataFrame(rows, columns = headers)
#show dataframe
print(df)
You are creating a dataframe with 692 rows with 23 columns as a new dataframe. However looking at the rows array, you only have 1 dimensional array so shape of passed values is not matching with indices. You are passing 692 x 1 to a dataframe with 692 x 23 which won't work.
If you want to create with the data you have, you should just use:
df=pd.DataFrame(rows, columns=headers[1:2])
Alternativly you can achieve your goal directly by using pandas.read_html
that processe the data by BeautifulSoup for you:
pd.read_html(url, attrs={'id':'LeaderBoard1_dg1_ctl00'}, header=[1])[0].iloc[:-1]
attrs={'id':'LeaderBoard1_dg1_ctl00'}
selects table by id
header=[1]
adjusts the header cause there are multiple headers
.iloc[:-1]
removes the table footer with pagination
import pandas as pd
pd.read_html('https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2018&month=0&season1=2018&ind=0&page=1_1500',
attrs={'id':'LeaderBoard1_dg1_ctl00'},
header=[1])[0]\
.iloc[:-1]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.