简体   繁体   中英

How to extract features from text data in python?

I am trying to build a machine learning algorithm to predict the number a person is thinking based on eeg signals of the brain.The dataset I found is available in text format and is described as- "The data is stored in a very simple text format including:

[id]: a numeric, only for reference purposes.

[event] id, a integer, used to distinguish the same event captured at different brain locations, used only by multichannel devices (all except MW).

[device]: a 2 character string, to identify the device used to capture the signals, "MW" for MindWave, "EP" for Emotive Epoc, "MU" for Interaxon Muse & "IN" for Emotiv Insight.

[channel]: a string, to indentify the 10/20 brain location of the signal, with possible values:

MindWave "FP1" EPOC "AF3, "F7", "F3", "FC5", "T7", "P7", "O1", "O2", "P8", "T8", "FC6", "F4", "F8", "AF4" Muse "TP9,"FP1","FP2", "TP10" Insight "AF3,"AF4","T7","T8","PZ"

[code]: a integer, to indentify the digit been thought/seen, with possible values 0,1,2,3,4,5,6,7,8,9 or -1 for random captured signals not related to any of the digits.

[size]: a integer, to identify the size in number of values captured in the 2 seconds of this signal, since the Hz of each device varies, in "theory" the value is close to 512Hz for MW, 128Hz for EP, 220Hz for MU & 128Hz for IN, for each of the 2 seconds.

[data]: a coma separated set of numbers, with the time-series amplitude of the signal, each device uses a different precision to identify the electrical potential captured from the brain: integers in the case of MW & MU or real numbers in the case of EP & IN.

There is no headers in the files, every line is a signal, and the fields are separated by a tab" How do I work with this data(plot the data, train different models on it)? Should I convert this to another format? if yes, then how? Dataset's link- http://www.mindbigdata.com/opendb/MindBigData-MW-v1.0.zip

I have already used a csv file for a similar ml project but have no idea how to use this one as there is a separate heading before every signal's data how do I extract these signals

字段以制表符分隔,您只需要 [code](数字)或第 5 个字段,以及 [data] 第 7 个字段(提取后,分隔)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM