Efficiently Concatenate Pandas DataFrames in series

Question

I have 10 DataFrames with equal number of rows and each having their own set of unique columns (not shared between any dataframes). I want to simply add the dataframes together in series, such that the final dataframe contains all the columns contained in all the dataframes. The first row of the final dataframe would contain the first row of the first, followed by the first row of the second, and so on til the tenth dataframe. I have tried pandas.concat(dataframes, axis=1), but it ended up creating NaN values in my numerical data somehow. I worked around it by writing an extremely slow and ugly method that increments through the rows by index and creating row by row the final data frame. What is the correct pandas way to do this?

Answer 1

Assuming all your dataframes are in a list df_list :

df0_index = df_list[0].index # get the first data frame's index

for i in range(1, len(df_list)):
    df_list[i] = df_list[i].set_index(df0_index) # reindex all the other dataframes

df_out = pd.concat(df_list, axis=1) # concatenate

Answer 2

Got it working. Simply had to set "ignore_index" to true when calling pandas.concat().

pd.concat(df_list, axis=1, ignore_index=True) # returns dataframes correctly.

Note that reindexing wouldn't work for some reason.

Answer 3

您可以通过列表理解来做到这一点：

pd.concat([df.reset_index(drop=True) for df in df_list], axis = 1)

Efficiently Concatenate Pandas DataFrames in series

Question

3 answers

solution1
1 ACCPTED 2017-08-06 23:18:39

solution2
1 2017-08-07 20:14:38

solution3
1 2017-08-07 20:26:07

Efficiently Concatenate Pandas DataFrames in series

Question

3 answers

solution1 1 ACCPTED 2017-08-06 23:18:39

solution2 1 2017-08-07 20:14:38

solution3 1 2017-08-07 20:26:07

solution1
1 ACCPTED 2017-08-06 23:18:39

solution2
1 2017-08-07 20:14:38

solution3
1 2017-08-07 20:26:07