在python pandas的循环中合并大数据帧中的许多子数据帧

Question

My program will generate many small data frames such as the ones below 我的程序将生成许多小数据框，如下所示

       Column_A  Column_B
 row1         1         2

       Column_A  Column_B
 row2         3         4

       Column_C  Column_D
 row1         5         6

       Column_C  Column_D
 row2         7         8

I want them to be merged as 我希望他们合并为

       Column_A  Column_B  Column_C  Column_D
 row1         1         2         5         6
 row2         3         4         7         8

How can this be done when the dataframes need to be merged one at a time. 当需要一次合并一个数据帧时，如何做到这一点。 The code to generate the smaller data frames are 生成较小数据帧的代码是

 df = {}
 df[0] = pd.DataFrame({'Column_A' : [1], 
                       'Column_B' : [2]},
                      index = ["row1"])
 df[1] = pd.DataFrame({'Column_A' : [3],
                       'Column_B' : [4]},
                       index = ["row2"])
 df[2] = pd.DataFrame({'Column_C' : [5],
                       'Column_D' : [6]},
                      index = ["row1"]) 
 df[3] = pd.DataFrame({'Column_C' : [7],
                       'Column_D' : [8]},
                      index = ["row2"])

I tried using merge and concat but they always end up creating more columns by appending the existing columns with either _x, _y or just repeating the columns 我尝试使用merge和concat，但是他们总是通过使用_x，_y附加现有列或仅重复列来创建更多列

For example, the merge in following way 例如，以下列方式合并

pdf = pd.DataFrame()

for i in range(4):
    pdf = pdf.merge(pd.DataFrame(df[i], index=["row{}".format(((i)%2)+1)]), how='outer', left_index=True, right_index=True)

produces 产生

      Column_A_x  Column_B_x  Column_A_y  Column_B_y  Column_C_x  Column_D_x  \
row1         1.0         2.0         NaN         NaN         5.0         6.0   
row2         NaN         NaN         3.0         4.0         NaN         NaN   

      Column_C_y  Column_D_y  
row1         NaN         NaN  
row2         7.0         8.0

Can someone help me with the correct way to merge it 有人可以用正确的方式帮助我合并它

Answer 1

It would help you a lot, if you can in any way keep the left and right parts in separate containers. 如果你能以任何方式将左右部分保存在不同的容器中，它会对你有很大的帮助。 Eg columns A and B in one, columns C and D in the other. 例如，列A和B在一个中，列C和D在另一个中。 That way you could piece them together using pandas.concat quite easily. 这样你就可以很容易地使用pandas.concat将它们拼凑在一起。 After the 2 halves have been built you need to merge them using the index in this case. 在构建了两半之后，您需要在这种情况下使用索引合并它们。

With your original df dictionary: 使用原始的df字典：

In [11]: pd.concat([df[0], df[1]]).merge(pd.concat([df[2], df[3]]), left_index=True, right_index=True)
Out[11]: 
      Column_A  Column_B  Column_C  Column_D
row1         1         2         5         6
row2         3         4         7         8

With containers for left and right halves the code reads a bit better (and there's no need for a loop): 使用左右两半的容器，代码读得更好（并且不需要循环）：

left = [pd.DataFrame({'Column_A' : [1], 
                      'Column_B' : [2]},
                     index = ["row1"]),
        pd.DataFrame({'Column_A' : [3],
                      'Column_B' : [4]},
                     index = ["row2"])]

right = [pd.DataFrame({'Column_C' : [5],
                       'Column_D' : [6]},
                      index = ["row1"]),
         pd.DataFrame({'Column_C' : [7],
                       'Column_D' : [8]},
                      index = ["row2"])]

df = pd.concat(left).merge(pd.concat(right), left_index=True, right_index=True)

Finally if you truly have no option but to store them in a dictionary like in your example: 最后，如果您真的别无选择，只能将它们存储在类似示例的字典中：

from functools import reduce, partial
from itertools import groupby

pdf = reduce(
    partial(pd.merge, left_index=True, right_index=True, how='outer'),
    (pd.concat(list(g))
     for cols, g in groupby(sorted(df.values(),
                                   key=lambda df_: tuple(df_.columns)),
                            lambda df_: tuple(df_.columns)))
)

Answer 2

try this: 试试这个：

In [186]: result = pd.concat([df[key].reset_index() for key in df.keys()],
   .....:                    ignore_index=True) \
   .....:            .set_index('index') \
   .....:            .groupby(level=0) \
   .....:            .sum() \
   .....:            .astype(int)

In [187]: result
Out[187]:
       Column_A  Column_B  Column_C  Column_D
index
row1          1         2         5         6
row2          3         4         7         8

在python pandas的循环中合并大数据帧中的许多子数据帧

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-04-14 23:34:13

解决方案2
1 2016-04-14 23:45:29

在python pandas的循环中合并大数据帧中的许多子数据帧

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-04-14 23:34:13

解决方案2 1 2016-04-14 23:45:29

解决方案1
2 已采纳 2016-04-14 23:34:13

解决方案2
1 2016-04-14 23:45:29