The purpose of this code is
Here is the code:
import pandas as pd
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 " "Safari/537.36", "X-Requested-With": "XMLHttpRequest"}
url = open(r"C:\Users\Sayed\Desktop\script\links.txt").readlines()
for site in url:
req = Request(site, headers=header)
page = urlopen(req)
soup = BeautifulSoup(page, 'lxml')
table = soup.find('table')
df = pd.read_html(str(table), parse_dates={'DateTime': ['Release Date', 'Time']}, index_col=[0])[0]
df = pd.concat(df, axis=1, join='outer').sort_index(ascending=False)
print(df)
Here is the error:
Traceback (most recent call last):
File "D:/Projects/Tutorial/try.py", line 18, in
df = pd.concat(df, axis=1, join='outer').sort_index(ascending=False)
File "C:\\Users\\Sayed\\Anaconda3\\lib\\site-packages\\pandas\\core\\reshape\\concat.py", line 225, in concat copy=copy, sort=sort)
File "C:\\Users\\Sayed\\Anaconda3\\lib\\site-packages\\pandas\\core\\reshape\\concat.py", line 241, in init
'"{name}"'.format(name=type(objs).__name__))
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame
The Pandas concat function takes a sequence or mapping of Series, DataFrame, or Panel objects as it's first argument. Your code is currently passing a single DataFrame.
I suspect the following will fix your issue:
import pandas as pd
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
header = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 " "Safari/537.36", "X-Requested-With": "XMLHttpRequest"}
url = open(r"C:\Users\Sayed\Desktop\script\links.txt").readlines()
dfs = []
for site in url:
req = Request(site, headers=header)
page = urlopen(req)
soup = BeautifulSoup(page, 'lxml')
table = soup.find('table')
df = pd.read_html(str(table), parse_dates={'DateTime': ['Release Date', 'Time']}, index_col=[0])[0]
dataframes.append(df)
concat_df = pd.concat(dfs, axis=1, join='outer').sort_index(ascending=False)
print(df)
All I have done is to create a list called dfs , as a place to append your DataFrames as you iterate through the sites. Then dfs is passed as the argument to concat.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.