简体   繁体   中英

Efficiently Concatenate Pandas DataFrames in series

I have 10 DataFrames with equal number of rows and each having their own set of unique columns (not shared between any dataframes). I want to simply add the dataframes together in series, such that the final dataframe contains all the columns contained in all the dataframes. The first row of the final dataframe would contain the first row of the first, followed by the first row of the second, and so on til the tenth dataframe. I have tried pandas.concat(dataframes, axis=1), but it ended up creating NaN values in my numerical data somehow. I worked around it by writing an extremely slow and ugly method that increments through the rows by index and creating row by row the final data frame. What is the correct pandas way to do this?

Assuming all your dataframes are in a list df_list :

df0_index = df_list[0].index # get the first data frame's index

for i in range(1, len(df_list)):
    df_list[i] = df_list[i].set_index(df0_index) # reindex all the other dataframes

df_out = pd.concat(df_list, axis=1) # concatenate 

Got it working. Simply had to set "ignore_index" to true when calling pandas.concat().

pd.concat(df_list, axis=1, ignore_index=True) # returns dataframes correctly.

Note that reindexing wouldn't work for some reason.

您可以通过列表理解来做到这一点:

pd.concat([df.reset_index(drop=True) for df in df_list], axis = 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM