[英]groupby/unstack on columns name
I have a dataframe with the following structure 我有一个具有以下结构的数据框
idx value Formula_name
0 123456789 100 Frequency No4
1 123456789 150 Frequency No25
2 123456789 125 Frequency No27
3 123456789 0.2 Power Level No4
4 123456789 0.5 Power Level No25
5 123456789 -1.0 Power Level No27
6 123456789 32 SNR No4
7 123456789 35 SNR No25
8 123456789 37 SNR No27
9 111222333 ...
So the only way to relate a frequency to its corresponding metric is via the number of the frequency. 因此,将频率与其对应的度量相关联的唯一方法是通过频率的编号。 I know the possible range (from 100 to 200 MHz in steps of 25 MHz), but not which frequencies (or how many) show up in the data, nor which "number" is used to relate the frequency to the metric.
我知道可能的范围(从100到200 MHz,以25 MHz为步长),但不知道数据中出现了哪些频率(或多少),也没有使用哪个“数字”将频率与度量联系起来。
I would like to arrive at a dataframe similar to that: 我想得到一个类似的数据框:
SNR Power Level
idx 100 125 150 175 200 100 125 150 175 200
0 123456789 32 37 35 NaN NaN 0.2 -1.0 0.5 NaN NaN
1 111222333 ...
For only one metric, I created two dataframes, one with the frequencies, one with the metric, and merged them on the number: 对于仅一个指标,我创建了两个数据框,一个带有频率,一个带有指标,然后将它们合并到数字上:
idx Formula_x value_x number Formula_y value_y
0 123456789 SNR 32 4 frequency 100
1 123456789 SNR 35 25 frequency 150
Then I would unstack the dataframe: 然后我将拆开数据框:
df.groupby(['idx','value_y']).first()[['value_x']].unstack()
This works for one metric, but I don't really see how I can apply it to more metrics and access them with a multiindex in the columns. 这适用于一个指标,但是我真的看不到如何将其应用于更多指标并使用列中的多索引访问它们。
Any ideas and suggestions would be very welcome. 任何想法和建议都将受到欢迎。
You can use: 您可以使用:
print (df)
idx value Formula_name
0 123456789 100.0 Frequency No4
1 123456789 150.0 Frequency No25
2 123456789 125.0 Frequency No27
3 123456789 0.2 Power Level No4
4 123456789 0.5 Power Level No25
5 123456789 -1.0 Power Level No27
6 123456789 32.0 SNR No4
7 123456789 35.0 SNR No25
8 123456789 37.0 SNR No27
#create new columns from Formula_name
df[['a','b']] = df.Formula_name.str.rsplit(n=1, expand=True)
#maping by Series column b - from No4, No25 to numbers 100,150...
maps = df[df.a == 'Frequency'].set_index('b')['value'].astype(int)
df['b'] = df.b.map(maps)
#remove rows where is Frequency, remove column Formula_name
df1 = df[df.a != 'Frequency'].drop('Formula_name', axis=1)
print (df1)
idx value a b
3 123456789 0.2 Power Level 100
4 123456789 0.5 Power Level 150
5 123456789 -1.0 Power Level 125
6 123456789 32.0 SNR 100
7 123456789 35.0 SNR 150
8 123456789 37.0 SNR 125
Two solutions - with unstack
and with pivot_table
. 两种解决方案-与
unstack
,并与pivot_table
。
df2 = df1.set_index(['idx','a', 'b']).unstack([1,2])
df2.columns = df2.columns.droplevel(0)
df2 = df2.rename_axis(None).rename_axis([None, None], axis=1)
print (df2)
Power Level SNR
100 150 125 100 150 125
123456789 0.2 0.5 -1.0 32.0 35.0 37.0
df3 = df1.pivot_table(index='idx', columns=['a','b'], values='value')
df3 = df3.rename_axis(None).rename_axis([None, None], axis=1)
print (df3)
Power Level SNR
100 125 150 100 125 150
123456789 0.2 -1.0 0.5 32.0 37.0 35.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.