[英]Apply fuzzy matching across a dataframe column and save results in a new column
[英]Python: apply function across categories and save results to new columns
我是 Python 的新手。 我正在分析脑电图数据。 我创建了 function extract_bands
来计算EEG 波段的值(基于此答案),但我无法跨类别应用 function 并将聚合数据保存在新数据集中
这是一个简化的数据集, pddf
:
import pandas as pd
import numpy as np
simple_df = {'subject': ['s1','s1','s1','s1','s1','s1','s2','s2','s2','s2','s2','s2','s3','s3','s3','s3','s3','s3','s4','s4','s4','s4','s4','s4'],
'group': ['a','a','a','a','a','a','a','a','a','a','a','a','c','c','c','c','c','c','c','c','c','c','c','c'],
'trial': ['1','1','2','2','4','4','2','2','3','3','5','5','1','1','2','2','3','3','3','3','5','5','6','6'],
'cond': ['c1','c1','c1','c1','c2','c2','c1','c1','c2','c2','c2','c2','c2','c2','c1','c1','c1','c1','c2','c2','c1','c1','c2','c2'],
'value': [ 8.88260935, 82.97797122, 18.26659492, 7.70070742, 12.76417463,
68.35936355, 7.59613253, 54.36616722, 9.11860667, 24.20324845,
86.1674253 , 99.96479613, 40.83798898, 23.72822971, 49.77969641,
2.19459866, 30.3883309 , 46.75944945, 11.47003917, 26.71771771,
88.93251086, 7.29166478, 7.76880683, 40.65701944]
}
pddf = pd.DataFrame(simple_df, columns = ['subject', 'group', 'trial', 'cond', 'value'])
这是 function extract_bands
:
# define frequency
fs = 256
# define EEG bands
eeg_bands = {'Delta': (0, 4),
'Theta': (4, 8),
'Alpha': (8, 12),
'Beta': (12, 30),
'Gamma': (30, 45)}
def extract_bands (data):
fft_vals = np.absolute(np.fft.rfft(data))
fft_freq = np.fft.rfftfreq(len(data), 1.0/fs)
eeg_band_fft = dict()
for band in eeg_bands:
freq_ix = np.where((fft_freq >= eeg_bands[band][0]) &
(fft_freq <= eeg_bands[band][1]))[0]
eeg_band_fft[band] = np.mean(fft_vals[freq_ix])
return eeg_band_fft
我可以将 function 应用于一项试验,并获得保存在字典eeg_band_fft
中的 EEG 波段的值。 在真实数据集中,每个试验有 256 个样本; 这里一次试验只有 2 个样本,因此 function 仅返回Delta
波段的值。
one_trial = pddf[(pddf.subject == "s1") & (pddf.cond == 'c1') & (pddf.trial == '1')]
print(one_trial)
#> subject group trial cond value
#> 0 s1 a 1 c1 8.882609
#> 1 s1 a 1 c1 82.977971
extract_bands(one_trial.value)
#> {'Delta': 91.86058057, 'Theta': nan, 'Alpha': nan, 'Beta': nan, 'Gamma': nan}
问题
现在,对于每个subject
,我如何在属于相同条件cond
的试验中应用 function extract_bands
?
基本上,我想返回一个数据集,其中每个subject
的每个cond
都有一行,总共有八列:'subject'、'group'、'cond' 以及字典eeg_band_fft
中五个 EEG 波段的值。
例子
以下代码使用groupby
执行我想要的(用于计算平均数),但我不知道如何使用 function extract_bands
使其工作。
pddf2 = pddf.groupby(["subject", "group", "cond"]).value.mean() # take the mean
pddf2
#> subject group cond
s1 a c1 29.456971
c2 40.561769
s2 a c1 30.981150
c2 54.863519
s3 c c1 32.280519
c2 32.283109
s4 c c1 48.112088
c2 21.653396
Name: value, dtype: float64
如果要对 DataFrame 执行自定义聚合,则应使用agg
聚合并指定自定义 function。 然后你应该将dict列转换为DataFrame,最后连接两个DataFrame。
我会这样做:
dfg = (pddf.groupby(["subject", "group", "cond"])
.agg({'value' : lambda x: extract_bands(x)})
.reset_index()
)
df_dict = pd.DataFrame.from_records(dfg['value'])
result = pd.concat([dfg.drop(columns=['value']), df_dict], axis=1)
此代码返回以下 DataFrame:
subject group cond Delta Theta Alpha Beta Gamma
0 s1 a c1 117.827883 NaN NaN NaN NaN
1 s1 a c2 81.123538 NaN NaN NaN NaN
2 s2 a c1 61.962300 NaN NaN NaN NaN
3 s2 a c2 219.454077 NaN NaN NaN NaN
4 s3 c c1 129.122075 NaN NaN NaN NaN
5 s3 c c2 64.566219 NaN NaN NaN NaN
6 s4 c c1 96.224176 NaN NaN NaN NaN
7 s4 c c2 86.613583 NaN NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.