简体   繁体   English

按列名分组/取消堆叠

[英]groupby/unstack on columns name

I have a dataframe with the following structure 我有一个具有以下结构的数据框

    idx  value  Formula_name
0   123456789     100     Frequency No4
1   123456789     150     Frequency No25
2   123456789     125     Frequency No27
3   123456789     0.2     Power Level No4
4   123456789     0.5     Power Level No25
5   123456789     -1.0    Power Level No27
6   123456789     32      SNR  No4
7   123456789     35      SNR  No25
8   123456789     37      SNR  No27
9   111222333     ...

So the only way to relate a frequency to its corresponding metric is via the number of the frequency. 因此,将频率与其对应的度量相关联的唯一方法是通过频率的编号。 I know the possible range (from 100 to 200 MHz in steps of 25 MHz), but not which frequencies (or how many) show up in the data, nor which "number" is used to relate the frequency to the metric. 我知道可能的范围(从100到200 MHz,以25 MHz为步长),但不知道数据中出现了哪些频率(或多少),也没有使用哪个“数字”将频率与度量联系起来。

I would like to arrive at a dataframe similar to that: 我想得到一个类似的数据框:

                  SNR                        Power Level
    idx           100   125  150   175  200  100  125  150 175 200
0   123456789     32    37   35    NaN  NaN  0.2  -1.0 0.5 NaN NaN
1   111222333     ...

For only one metric, I created two dataframes, one with the frequencies, one with the metric, and merged them on the number: 对于仅一个指标,我创建了两个数据框,一个带有频率,一个带有指标,然后将它们合并到数字上:

     idx         Formula_x  value_x number   Formula_y  value_y
0    123456789   SNR        32      4        frequency  100
1    123456789   SNR        35      25       frequency  150

Then I would unstack the dataframe: 然后我将拆开数据框:

df.groupby(['idx','value_y']).first()[['value_x']].unstack()

This works for one metric, but I don't really see how I can apply it to more metrics and access them with a multiindex in the columns. 这适用于一个指标,但是我真的看不到如何将其应用于更多指标并使用列中的多索引访问它们。

Any ideas and suggestions would be very welcome. 任何想法和建议都将受到欢迎。

You can use: 您可以使用:

print (df)
         idx  value      Formula_name
0  123456789  100.0     Frequency No4
1  123456789  150.0    Frequency No25
2  123456789  125.0    Frequency No27
3  123456789    0.2   Power Level No4
4  123456789    0.5  Power Level No25
5  123456789   -1.0  Power Level No27
6  123456789   32.0           SNR No4
7  123456789   35.0          SNR No25
8  123456789   37.0          SNR No27

#create new columns from Formula_name
df[['a','b']] = df.Formula_name.str.rsplit(n=1, expand=True)

#maping by Series column b - from No4, No25 to numbers 100,150...
maps = df[df.a == 'Frequency'].set_index('b')['value'].astype(int)
df['b'] = df.b.map(maps)

#remove rows where is Frequency, remove column Formula_name
df1 = df[df.a != 'Frequency'].drop('Formula_name', axis=1)
print (df1)
         idx  value            a    b
3  123456789    0.2  Power Level  100
4  123456789    0.5  Power Level  150
5  123456789   -1.0  Power Level  125
6  123456789   32.0          SNR  100
7  123456789   35.0          SNR  150
8  123456789   37.0          SNR  125

Two solutions - with unstack and with pivot_table . 两种解决方案-与unstack ,并与pivot_table

df2 = df1.set_index(['idx','a', 'b']).unstack([1,2])
df2.columns = df2.columns.droplevel(0)
df2 = df2.rename_axis(None).rename_axis([None, None], axis=1)
print (df2)
          Power Level             SNR            
                  100  150  125   100   150   125
123456789         0.2  0.5 -1.0  32.0  35.0  37.0

df3 = df1.pivot_table(index='idx', columns=['a','b'], values='value')
df3 = df3.rename_axis(None).rename_axis([None, None], axis=1)
print (df3)
          Power Level             SNR            
                  100  125  150   100   125   150
123456789         0.2 -1.0  0.5  32.0  37.0  35.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM