简体   繁体   English

通过在字典中附加熊猫数据框来创建字典

[英]Create a dictionary by appending pandas dataframes in a dictionary

I have a dictionary of pandas DataFrames.我有一本熊猫数据帧字典。 I want to combine the dataframes in this dictionary to create a dictionary of fewer keys but with larger dataframes as values.我想结合这个字典中的数据帧来创建一个键更少但数据帧更大的字典作为值。

For example, in the example below, I want create d_new from d .例如,在下面的示例中,我想从d创建d_new

d = {1: pd.DataFrame({'a':[1,2],'b':[3,4]}), 
     2: pd.DataFrame({'a':[3,4],'b':[5,6]}),
     3: pd.DataFrame({'a':[10,11],'b':[12,13]}), 
     4: pd.DataFrame({'a':[12,13],'b':[14,15]})}

d_new = {1:pd.DataFrame({'a':[1,2,3,4],'b':[3,4,5,6]}), 
         3:pd.DataFrame({'a':[10,11,12,13],'b':[12,13,14,15]})}

I tried:我试过:

import pandas as pd
from collections import defaultdict
d_new = defaultdict(pd.DataFrame)
for k_n in [1,3]:
    r = 1 if k_n == 1 else 3
    for k in range(r,r+2):
        d_new[k_n].append(d[k])

But this just throws up a dictionary of empty dataframes.但这只会抛出一个空数据框的字典。 We can convert the dataframes into lists and append them and create a dataframe from them later but I want to see if I can save that unnecessary step.我们可以将数据帧转换为列表并附加它们,稍后再从它们创建一个数据帧,但我想看看是否可以省去那个不必要的步骤。

You can use itertools.groupby to group the keys (here I did it 2 by 2 from the key linear range) and pandas.concat to concatenate the dataframes:您可以使用itertools.groupby对键进行分组(这里我从键线性范围内按 2 x 2 进行)和pandas.concat来连接数据帧:

from itertools import groupby
d_new = {2*k+1: pd.concat([d[e] for e in g])
         for k,g in groupby(d, lambda k: (k-1)//2)}

Output:输出:

{1:    a  b
 0  1  3
 1  2  4
 0  3  5
 1  4  6,
 3:     a   b
 0  10  12
 1  11  13
 0  12  14
 1  13  15}

NB.注意。 I showed you the minimal main concept, but of course you can tweak it to your exact needs (eg, resetting the index, different condition for grouping)我向您展示了最小的主要概念,但当然您可以根据您的确切需求对其进行调整(例如,重置索引、不同的分组条件)

If you have non linear range keys (ie not 1/2/3...), you'll need to wrap the dictionary in enumerate :如果您有非线性范围键(即不是 1/2/3 ...),则需要将字典包装在enumerate

{2*k+1: pd.concat([x[1] for x in g])
 for k,g in groupby(enumerate(d.values()), lambda k: k[0]//2)}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM