[英]Pandas dataframe unstack data and create new columns
I have two sets of stacked data as follows:我有两组堆叠数据如下:
set n value_1 value_2
0 1 1024 25942.6 25807.8 ----> first set starts here
1 1 2048 72000.5 71507.9
2 1 4096 161095.0 160303.0
3 1 8192 356419.0 354928.0
4 1 16384 793562.0 788666.0
5 1 32768 1914250.0 1889850.0
6 1 65536 3490860.0 3479040.0
7 1 131072 8096130.0 8036290.0
8 1 262144 16616500.0 16525400.0
11 2 1024 35116.3 35032.5 ----> second set starts here
12 2 2048 98783.8 98507.0
13 2 4096 230813.0 230206.0
14 2 8192 521754.0 518052.0
15 2 16384 1046870.0 1040990.0
16 2 32768 2118340.0 2112680.0
17 2 65536 4693000.0 4673130.0
18 2 131072 9960240.0 9892870.0
19 2 262144 21230600.0 21068700.0
How can I unstack them so that I get two new columns value_1_2
, and value_2_2
, which correspond to the second set of data and match to the first one based on the value of n
?如何取消堆叠它们,以便获得两个新列value_1_2
和value_2_2
,它们对应于第二组数据并根据n
的值与第一组数据匹配?
This is what I want:这就是我要的:
n value_1 value_2 value_1_2 value_2_2
1024 25942.6 25807.8 35116.3 35032.5
2048 72000.5 71507.9 98783.8 98507
4096 161095 160303 230813 230206
8192 356419 354928 521754 518052
16384 793562 788666 1046870 1040990
32768 1914250 1889850 2118340 2112680
65536 3490860 3479040 4693000 4673130
131072 8096130 8036290 9960240 9892870
262144 16616500 16525400 21230600 21068700
First we create a list of dfs
by using DataFrame.groupby
to group the dataframe on the column Set
, then for each group in the dataframe we use DataFrame.add_suffix
to add the group identifier to each of the columns: First we create a list of dfs
by using DataFrame.groupby
to group the dataframe on the column Set
, then for each group in the dataframe we use DataFrame.add_suffix
to add the group identifier to each of the columns:
Finally, we use functools.reduce
to reduce the list of dataframes dfs
to the single unstacked dataframe by applying pd.merge
operation on the consecutive dataframe on the column n
.最后,我们使用functools.reduce
通过对列n
上的连续 dataframe 应用pd.merge
操作,将数据帧dfs
列表减少为单个未堆叠的 dataframe。
from functools import reduce
dfs = [
g.drop('set', 1).add_suffix(f'_{k}').rename({f'n_{k}': 'n'}, axis=1)
for k, g in df.groupby('set')
]
df1 = reduce(lambda x, y: pd.merge(x, y, on='n'), dfs)
Result:结果:
# print(df1)
n value_1_1 value_2_1 value_1_2 value_2_2
0 1024 25942.6 25807.8 35116.3 35032.5
1 2048 72000.5 71507.9 98783.8 98507.0
2 4096 161095.0 160303.0 230813.0 230206.0
3 8192 356419.0 354928.0 521754.0 518052.0
4 16384 793562.0 788666.0 1046870.0 1040990.0
5 32768 1914250.0 1889850.0 2118340.0 2112680.0
6 65536 3490860.0 3479040.0 4693000.0 4673130.0
7 131072 8096130.0 8036290.0 9960240.0 9892870.0
8 262144 16616500.0 16525400.0 21230600.0 21068700.0
There is a sledgehammer approach.有一种大锤方法。 First第一的
foo = df.groupby('n')
This is a list of pairs (n, group)
.这是对(n, group)
的列表。 For each group, you do对于每个组,你做
newgroup = pd.concat([group.iloc[0], group.iloc[1]], axis=1)
(you can change the column names in the concat). (您可以更改连接中的列名)。
and finally pd.concat
the new groups vertically.最后pd.concat
垂直的新组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.