简体   繁体   中英

pandas merging 300 dataframes

The purpose of this code is

  1. Scrape a 300 of tables via Pandas and Beautiful Soup
  2. Concatenate this tables into a single data frame The code works fine for the first step. But it is not working in the second.

Here is the code:

import pandas as pd
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup


header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 " "Safari/537.36", "X-Requested-With": "XMLHttpRequest"}
url = open(r"C:\Users\Sayed\Desktop\script\links.txt").readlines()

for site in url:
    req = Request(site, headers=header)
    page = urlopen(req)
    soup = BeautifulSoup(page, 'lxml')

    table = soup.find('table')
    df = pd.read_html(str(table), parse_dates={'DateTime': ['Release Date', 'Time']}, index_col=[0])[0]
    df = pd.concat(df, axis=1, join='outer').sort_index(ascending=False)
    print(df)

Here is the error:

Traceback (most recent call last):

File "D:/Projects/Tutorial/try.py", line 18, in

df = pd.concat(df, axis=1, join='outer').sort_index(ascending=False)

File "C:\\Users\\Sayed\\Anaconda3\\lib\\site-packages\\pandas\\core\\reshape\\concat.py", line 225, in concat copy=copy, sort=sort)

File "C:\\Users\\Sayed\\Anaconda3\\lib\\site-packages\\pandas\\core\\reshape\\concat.py", line 241, in init

'"{name}"'.format(name=type(objs).__name__))

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame

The Pandas concat function takes a sequence or mapping of Series, DataFrame, or Panel objects as it's first argument. Your code is currently passing a single DataFrame.

I suspect the following will fix your issue:

import pandas as pd
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup


header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 " "Safari/537.36", "X-Requested-With": "XMLHttpRequest"}
url = open(r"C:\Users\Sayed\Desktop\script\links.txt").readlines()

dfs = []

for site in url:
    req = Request(site, headers=header)
    page = urlopen(req)
    soup = BeautifulSoup(page, 'lxml')

    table = soup.find('table')
    df = pd.read_html(str(table), parse_dates={'DateTime': ['Release Date', 'Time']}, index_col=[0])[0]
    dataframes.append(df)

concat_df = pd.concat(dfs, axis=1, join='outer').sort_index(ascending=False)
print(df)

All I have done is to create a list called dfs , as a place to append your DataFrames as you iterate through the sites. Then dfs is passed as the argument to concat.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM