简体   繁体   中英

Appending pandas dataframes on same index from multiple excel sheets

I've got an issue with the following, which is a simple code that calculates some means on a sports dataset. If needed I can provide an example of what these data look like.

Right now attempting to append the generated dataframes back-to-back, which I am not able to do. Basically I would like to take all the datasets (there are two here, but more to be used), and to generate a timeline based on the outputs. All the files have the same column and indices. Below is the code I'm using:

import pandas as pd

season_list = ["2017-2018","2018-2019"]
excl_value = 10
header_list = []

for seas in season_list:
    excel_sheet = pd.read_excel(r'Data\NData' + str(seas) + '.xlsx')
    excel_sheet = excel_sheet.fillna(value = 0)
    headers = excel_sheet.columns.values.tolist()
    if headers not in header_list:
        header_list.append(headers)
    for header in header_list:
        data_mean = excel_sheet[header][excel_sheet.games >=                         
excl_value].mean()
        data_mean = round(data_mean,2)
        print(data_mean)

With this I simply get two datasets, which look akin to the below:

age                   25.42
games                 22.42
games_starts          18.19

age                   24.95
games                 27.18
games_starts          22.00

These are the results of two different calculations of the mean, for a few variables. Now ideally I'd like to have these broken down by file, and by the same index concurrently. I'd see it working as below:

                    2017-2018  2018-2019
age                   25.42      24.95
games                 22.42      27.18
games_starts          18.19      22.00

Like this I can then create a timeline of the values together. However struggling to achieve this. How should I go about combining the calculations run, into one set all together? Thanks!

I've already tested a few methods of achieving this. One was just simply appending the excel sheets with .append, but I am seeing some issues with the values not coming together. I also thought about just grabbing the mean calculations and applying them to their own dictionary/list/dataframe, but I haven't managed to actually push them into their own set.

You can do it nicely by 3 steps. Load:

sports_data = [pd.read_excel(r'Data\NData' + str(season) + '.xlsx') for season in season_list]

Now you'd like to add a column to identify each sheet such that you can group on it in the next step:

for i, season in enumerate(season_list):
    sports_data[i]['season'] = season

Combine to a single DataFrame:

sports_data = pd.concat(sports_data)

Now we are ready to group by season and take the mean:

sports_data.groupby('season').mean()

This will give you

                  age       games   games_starts
season          
2017-2018   26.666667   33.000000   22.333333
2018-2019   25.666667   25.666667   19.000000

What is left now is only to transpose it to get what you want:

sports_data.groupby('season').mean().T

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM