简体   繁体   中英

Pandas concatenate dataframes with for loop

I am trying to get tables from a website. The website's URL contains dates so I will have to iterate over dates in order to get historical data. I am generating dates as follows:

import datetime

start = datetime.datetime.strptime("26-09-2016", "%d-%m-%Y")
end = datetime.datetime.strptime("30-09-2016", "%d-%m-%Y")
date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]

dates_list = []
for date in date_generated:
    txt = str(str(date.day) + '.' + str(date.month) + '.' + str(date.year))
    dates_list.append(txt)

dates_list

After this, I am running the code below to concatenate all the tables:

for i in range(0, 3):
    allURL = 'https://www.uzse.uz/trade_results?date=' + dates_list[i] + '&locale=en&mkt_id=ALL&page=%d'

    ndf_list = []
    for i in range(1, 100):
        url = allURL %i
        if pd.read_html(url)[0].empty:
            break
        else :
            ndf_list.append(pd.read_html(url)[0])

    ndf = pd.concat(ndf_list)
    ndf.insert(0, 'Date', dates_list[i])

mdf = pd.concat(ndf, ignore_index = True)
mdf

However, this does not work and I get:

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

I do not understand what I am doing wrong. I am expecting to have one table that comes from 26th, 27th, and 28th September.

Please help.

Not sure about the last line(s), but I'd approach it this way

import datetime
import pandas as pd

start = datetime.datetime.strptime("26-09-2016", "%d-%m-%Y")
end = datetime.datetime.strptime("30-09-2016", "%d-%m-%Y")
date_generated = [
    start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]

dates_list = []
for date in date_generated:
    txt = str(str(date.day) + '.' + str(date.month) + '.' + str(date.year))
    dates_list.append(txt)

dates_list

ndf = pd.DataFrame()  # create empty ndf
for i in range(0, 3):
    allURL = 'https://www.uzse.uz/trade_results?date=' + \
        dates_list[i] + '&locale=en&mkt_id=ALL&page=%d'

    # ndf_list = []
    for j in range(1, 100):
        url = allURL % j
        if pd.read_html(url)[0].empty:
            break
        else:
            # ndf_list.append(pd.read_html(url)[0])
            chunk = pd.read_html(url)[0]
            chunk['Date'] = dates_list[i] # Date is positioned at last position, let's fix that
            # get a list of all the columns
            cols = chunk.columns.tolist()
            # rearrange the columns, move the last element (Date) to the first position
            cols = cols[-1:] + cols[:-1]
            # reorder the dataframe
            chunk = chunk[cols]
            ndf = pd.concat([ndf, chunk])

    # ndf = pd.concat(ndf_list)

# ndf.insert(0, 'Date', dates_list[i])

print(ndf)
# mdf = pd.concat(ndf, ignore_index=True)
# mdf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM