简体   繁体   English

Pandas dataframe 取消堆叠数据并创建新列

[英]Pandas dataframe unstack data and create new columns

I have two sets of stacked data as follows:我有两组堆叠数据如下:

    set          n       value_1    value_2
0    1         1024     25942.6     25807.8 ----> first set starts here
1    1         2048     72000.5     71507.9
2    1         4096    161095.0    160303.0
3    1         8192    356419.0    354928.0
4    1        16384    793562.0    788666.0
5    1        32768   1914250.0   1889850.0
6    1        65536   3490860.0   3479040.0
7    1       131072   8096130.0   8036290.0
8    1       262144  16616500.0  16525400.0
11   2         1024     35116.3     35032.5 ----> second set starts here
12   2         2048     98783.8     98507.0
13   2         4096    230813.0    230206.0
14   2         8192    521754.0    518052.0
15   2        16384   1046870.0   1040990.0
16   2        32768   2118340.0   2112680.0
17   2        65536   4693000.0   4673130.0
18   2       131072   9960240.0   9892870.0
19   2       262144  21230600.0  21068700.0

How can I unstack them so that I get two new columns value_1_2 , and value_2_2 , which correspond to the second set of data and match to the first one based on the value of n ?如何取消堆叠它们,以便获得两个新列value_1_2value_2_2 ,它们对应于第二组数据并根据n的值与第一组数据匹配?

This is what I want:这就是我要的:

n         value_1       value_2     value_1_2   value_2_2
1024      25942.6       25807.8     35116.3   35032.5
2048      72000.5       71507.9     98783.8   98507
4096      161095        160303      230813    230206
8192      356419        354928      521754    518052
16384     793562        788666      1046870   1040990
32768     1914250       1889850     2118340   2112680
65536     3490860       3479040     4693000   4673130
131072    8096130       8036290     9960240   9892870
262144    16616500      16525400    21230600  21068700

First we create a list of dfs by using DataFrame.groupby to group the dataframe on the column Set , then for each group in the dataframe we use DataFrame.add_suffix to add the group identifier to each of the columns: First we create a list of dfs by using DataFrame.groupby to group the dataframe on the column Set , then for each group in the dataframe we use DataFrame.add_suffix to add the group identifier to each of the columns:

Finally, we use functools.reduce to reduce the list of dataframes dfs to the single unstacked dataframe by applying pd.merge operation on the consecutive dataframe on the column n .最后,我们使用functools.reduce通过对列n上的连续 dataframe 应用pd.merge操作,将数据帧dfs列表减少为单个未堆叠的 dataframe。

from functools import reduce

dfs = [
    g.drop('set', 1).add_suffix(f'_{k}').rename({f'n_{k}': 'n'}, axis=1)
    for k, g in df.groupby('set')
]

df1 = reduce(lambda x, y: pd.merge(x, y, on='n'), dfs)

Result:结果:

# print(df1)

        n   value_1_1   value_2_1   value_1_2   value_2_2
0    1024     25942.6     25807.8     35116.3     35032.5
1    2048     72000.5     71507.9     98783.8     98507.0
2    4096    161095.0    160303.0    230813.0    230206.0
3    8192    356419.0    354928.0    521754.0    518052.0
4   16384    793562.0    788666.0   1046870.0   1040990.0
5   32768   1914250.0   1889850.0   2118340.0   2112680.0
6   65536   3490860.0   3479040.0   4693000.0   4673130.0
7  131072   8096130.0   8036290.0   9960240.0   9892870.0
8  262144  16616500.0  16525400.0  21230600.0  21068700.0

There is a sledgehammer approach.有一种大锤方法。 First第一的

foo = df.groupby('n')

This is a list of pairs (n, group) .这是对(n, group)的列表。 For each group, you do对于每个组,你做

newgroup = pd.concat([group.iloc[0], group.iloc[1]], axis=1)

(you can change the column names in the concat). (您可以更改连接中的列名)。

and finally pd.concat the new groups vertically.最后pd.concat垂直的新组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM