繁体   English   中英

Python:跨类别应用 function 并将结果保存到新列

[英]Python: apply function across categories and save results to new columns

我是 Python 的新手。 我正在分析脑电图数据。 我创建了 function extract_bands来计算EEG 波段的值(基于此答案),但我无法跨类别应用 function 并将聚合数据保存在新数据集中

这是一个简化的数据集, pddf

import pandas as pd 
import numpy as np  


simple_df = {'subject': ['s1','s1','s1','s1','s1','s1','s2','s2','s2','s2','s2','s2','s3','s3','s3','s3','s3','s3','s4','s4','s4','s4','s4','s4'],
        'group': ['a','a','a','a','a','a','a','a','a','a','a','a','c','c','c','c','c','c','c','c','c','c','c','c'], 
        'trial': ['1','1','2','2','4','4','2','2','3','3','5','5','1','1','2','2','3','3','3','3','5','5','6','6'],
                'cond': ['c1','c1','c1','c1','c2','c2','c1','c1','c2','c2','c2','c2','c2','c2','c1','c1','c1','c1','c2','c2','c1','c1','c2','c2'],
             'value': [ 8.88260935, 82.97797122, 18.26659492,  7.70070742, 12.76417463,
       68.35936355,  7.59613253, 54.36616722,  9.11860667, 24.20324845,
       86.1674253 , 99.96479613, 40.83798898, 23.72822971, 49.77969641,
        2.19459866, 30.3883309 , 46.75944945, 11.47003917, 26.71771771,
       88.93251086,  7.29166478,  7.76880683, 40.65701944]
        }

pddf = pd.DataFrame(simple_df, columns = ['subject', 'group', 'trial', 'cond', 'value'])

这是 function extract_bands

# define frequency
fs = 256 

# define EEG bands
eeg_bands = {'Delta': (0, 4),
             'Theta': (4, 8),
             'Alpha': (8, 12),
             'Beta': (12, 30),
             'Gamma': (30, 45)}

def extract_bands (data):
    fft_vals = np.absolute(np.fft.rfft(data))
    fft_freq = np.fft.rfftfreq(len(data), 1.0/fs)
    eeg_band_fft = dict()
    for band in eeg_bands:  
        freq_ix = np.where((fft_freq >= eeg_bands[band][0]) & 
                       (fft_freq <= eeg_bands[band][1]))[0]
        eeg_band_fft[band] = np.mean(fft_vals[freq_ix])
    return eeg_band_fft

我可以将 function 应用于一项试验,并获得保存在字典eeg_band_fft中的 EEG 波段的值。 在真实数据集中,每个试验有 256 个样本; 这里一次试验只有 2 个样本,因此 function 仅返回Delta波段的值。

one_trial = pddf[(pddf.subject == "s1") & (pddf.cond == 'c1') & (pddf.trial == '1')]

print(one_trial)
#>   subject group trial cond      value
#> 0      s1     a     1   c1   8.882609
#> 1      s1     a     1   c1  82.977971

extract_bands(one_trial.value)

#> {'Delta': 91.86058057, 'Theta': nan, 'Alpha': nan, 'Beta': nan, 'Gamma': nan}

问题

现在,对于每个subject ,我如何在属于相同条件cond的试验中应用 function extract_bands

基本上,我想返回一个数据集,其中每个subject的每个cond都有一行,总共有八列:'subject'、'group'、'cond' 以及字典eeg_band_fft中五个 EEG 波段的值。

例子

以下代码使用groupby执行我想要的(用于计算平均数),但我不知道如何使用 function extract_bands使其工作。

pddf2 = pddf.groupby(["subject", "group", "cond"]).value.mean() # take the mean
pddf2
#> subject  group  cond
s1       a      c1      29.456971
                c2      40.561769
s2       a      c1      30.981150
                c2      54.863519
s3       c      c1      32.280519
                c2      32.283109
s4       c      c1      48.112088
                c2      21.653396
Name: value, dtype: float64

reprexpy package 创建于 2021-05-26

如果要对 DataFrame 执行自定义聚合,则应使用agg聚合并指定自定义 function。 然后你应该将dict列转换为DataFrame,最后连接两个DataFrame。

我会这样做:

dfg = (pddf.groupby(["subject", "group", "cond"])
        .agg({'value' : lambda x: extract_bands(x)})
        .reset_index()
)
df_dict = pd.DataFrame.from_records(dfg['value'])
result = pd.concat([dfg.drop(columns=['value']), df_dict], axis=1)

此代码返回以下 DataFrame:

subject group cond       Delta  Theta  Alpha  Beta  Gamma
0      s1     a   c1  117.827883    NaN    NaN   NaN    NaN
1      s1     a   c2   81.123538    NaN    NaN   NaN    NaN
2      s2     a   c1   61.962300    NaN    NaN   NaN    NaN
3      s2     a   c2  219.454077    NaN    NaN   NaN    NaN
4      s3     c   c1  129.122075    NaN    NaN   NaN    NaN
5      s3     c   c2   64.566219    NaN    NaN   NaN    NaN
6      s4     c   c1   96.224176    NaN    NaN   NaN    NaN
7      s4     c   c2   86.613583    NaN    NaN   NaN    NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM