I am trying to get tables from a website. The website's URL contains dates so I will have to iterate over dates in order to get historical data. I am generating dates as follows:
import datetime
start = datetime.datetime.strptime("26-09-2016", "%d-%m-%Y")
end = datetime.datetime.strptime("30-09-2016", "%d-%m-%Y")
date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]
dates_list = []
for date in date_generated:
txt = str(str(date.day) + '.' + str(date.month) + '.' + str(date.year))
dates_list.append(txt)
dates_list
After this, I am running the code below to concatenate all the tables:
for i in range(0, 3):
allURL = 'https://www.uzse.uz/trade_results?date=' + dates_list[i] + '&locale=en&mkt_id=ALL&page=%d'
ndf_list = []
for i in range(1, 100):
url = allURL %i
if pd.read_html(url)[0].empty:
break
else :
ndf_list.append(pd.read_html(url)[0])
ndf = pd.concat(ndf_list)
ndf.insert(0, 'Date', dates_list[i])
mdf = pd.concat(ndf, ignore_index = True)
mdf
However, this does not work and I get:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
I do not understand what I am doing wrong. I am expecting to have one table that comes from 26th, 27th, and 28th September.
Please help.
Not sure about the last line(s), but I'd approach it this way
import datetime
import pandas as pd
start = datetime.datetime.strptime("26-09-2016", "%d-%m-%Y")
end = datetime.datetime.strptime("30-09-2016", "%d-%m-%Y")
date_generated = [
start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]
dates_list = []
for date in date_generated:
txt = str(str(date.day) + '.' + str(date.month) + '.' + str(date.year))
dates_list.append(txt)
dates_list
ndf = pd.DataFrame() # create empty ndf
for i in range(0, 3):
allURL = 'https://www.uzse.uz/trade_results?date=' + \
dates_list[i] + '&locale=en&mkt_id=ALL&page=%d'
# ndf_list = []
for j in range(1, 100):
url = allURL % j
if pd.read_html(url)[0].empty:
break
else:
# ndf_list.append(pd.read_html(url)[0])
chunk = pd.read_html(url)[0]
chunk['Date'] = dates_list[i] # Date is positioned at last position, let's fix that
# get a list of all the columns
cols = chunk.columns.tolist()
# rearrange the columns, move the last element (Date) to the first position
cols = cols[-1:] + cols[:-1]
# reorder the dataframe
chunk = chunk[cols]
ndf = pd.concat([ndf, chunk])
# ndf = pd.concat(ndf_list)
# ndf.insert(0, 'Date', dates_list[i])
print(ndf)
# mdf = pd.concat(ndf, ignore_index=True)
# mdf
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.