简体   繁体   中英

tsfresh package for feature extraction

I have a dataframe. I would like to extract features based on a time window.

df = pd.DataFrame({'time':[1,2,3,4,5,6,7,8,9,10,2,3,5,6,8,10,12],
                   'id':[793,793,793,793,793,793,793,793,793,793,942,942,942,942,942,942,942],
                   'B1':[10,20,30,40,50,60,70,80,90,100,23,24,25,27,30,44,55],
                   'B2':[10,20,30,40,50,60,70,80,90,100,23,24,25,27,30,44,55],
                   'B3':[10,20,30,40,50,60,70,80,90,100,23,24,25,27,30,44,55]})
time_window = pd.DataFrame({'time':[2,4,6,8,5,8], 'id':[793,793,793,793,942,942]})

Here, my time window is

 [2,4]--> for participant 793 [6,8]--> for participant 793 [5,8]--> for participant 942

My goal is to extract the features on the specified time window for each participant. Therefore, I wrote a function

from tsfresh import extract_features

def apply_tsfresh(col):
  for i in range(len(time)):
    col.loc[time_window[i]:time_window[i+1]] = extract_features(col.loc[time_window[i]:time_window[i+1]], column_id="id")
    return col 

extracted_freatures = df.set_index('time').apply(apply_tsfresh)

It will extract the features based on the specified time window for each participant. However, I am not getting any results. It provides me an error.

Could you please help me here? I am totally out of any ideas.

My desired output should be look like as: desired result

*Here, the extracted features maybe more than just two. Also the extracted features values maybe different. I am just giving you an example.

Initially, an empty dataframe is created 'extracted_freatures_'. A cycle is created, step two. Elements are taken from the dataframe 'time_window' column 'time'. The results from 'extract_features' are attached to the 'extract_features' dataframe. Don't ask me how 'tsfresh' works, I don't know.

extracted_freatures_ = pd.DataFrame()

df = df.set_index('time')

for i in range(0, len(time_window['time']), 2):
    ind1 = time_window.loc[i, 'time']
    ind2 = time_window.loc[i+1, 'time']
    a = extract_features(df.loc[[ind1, ind2]], column_id="id")
    extracted_freatures_ = pd.concat([extracted_freatures_, a])

print(extracted_freatures_)

Output

Feature Extraction: 100%|██████████| 6/6 [00:00<00:00, 36.71it/s]
Feature Extraction: 100%|██████████| 6/6 [00:00<00:00, 39.50it/s]
Feature Extraction: 100%|██████████| 6/6 [00:00<00:00, 40.81it/s]
     B2__variance_larger_than_standard_deviation  ...  B3__mean_n_absolute_max__number_of_maxima_7
793                                          1.0  ...                                          NaN
942                                          0.0  ...                                          NaN
793                                          1.0  ...                                          NaN
942                                          1.0  ...                                          NaN
793                                          1.0  ...                                          NaN
942                                          1.0  ...                                          NaN

[6 rows x 2367 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM