简体繁体 English

python 监督学习与数据集分类

[英]python supervised learning with data set classification

原文 2020-06-20 20:39:41 1 1 python/ deep-learning/ pattern-matching

I am new to deep learning and am currently researching a certain topic.我是深度学习的新手，目前正在研究某个主题。 I am looking for machine learning detection of anomalies in time series pattern and their implementation in python.我正在寻找时间序列模式中异常的机器学习检测及其在 python 中的实现。

For example, I have a recording of the different CPU frequencies of my computer during a certain time interval.例如，我在某个时间间隔内记录了我的计算机的不同 CPU 频率。 I would like to implement a supervised learning algorithm that takes a time series of CPU frequency as an input and decides, whether anything "unusual" happened during that time (unusual CPU usage etc).我想实现一个监督学习算法，它以 CPU 频率的时间序列作为输入，并决定在那段时间是否发生任何“不寻常”的事情（不寻常的 CPU 使用率等）。

EDIT:编辑：

My data sets look the following way, every 10 seconds the current CPU frequency is measured.我的数据集如下所示，每 10 秒测量一次当前 CPU 频率。 I have not specified an exact number of datapoints per set, the following is just for illustration.我没有指定每组数据点的确切数量，以下仅用于说明。 But I am expecting around 2500 datapoints per set:但我预计每组大约有 2500 个数据点：

Dataset_1: {1.2, 1.2, 1.6, 1.3, 1.5, 1.7, 1.6, 1.4, 1.5} -> Label: "good"数据集_1：{1.2、1.2、1.6、1.3、1.5、1.7、1.6、1.4、1.5} -> Label：“好”

Dataset_2: {1.3, 1.2, 1.4, 1.3, 1.4, 1.5, 1.9, 2.1, 2.0} -> Label: "good"数据集_2：{1.3、1.2、1.4、1.3、1.4、1.5、1.9、2.1、2.0} -> Label：“好”

Dataset_n: {1.3, 1.2, 3.6, 3.5, 1.4, 1.5, 3.3, 3.2, 1.2} -> Label: "bad"数据集_n：{1.3、1.2、3.6、3.5、1.4、1.5、3.3、3.2、1.2} -> Label：“坏”

My understanding of a supervised ML algorithm is that i have training datasets.我对监督机器学习算法的理解是我有训练数据集。 However, every tutorial that i have found so far always labels each value in a data set.但是，到目前为止，我发现的每个教程总是标记数据集中的每个值。 In my case that would not be possible, as I could only tell my ML algorithm:在我的情况下这是不可能的，因为我只能告诉我的 ML 算法：

a) this time series data set is normal a) 这个时间序列数据集是正常的

b) in this data set something is not normal b) 在这个数据集中有些东西是不正常的

but i wouldn't be able to label each individual value, meaning i cannot say:但我不能 label 每个单独的值，这意味着我不能说：

1.2 -> good 1.2 -> 好

1.3 -> bad 1.3 -> 不好

1.4 -> good 1.4 -> 好

As there are many different ML algorithm, it is hard for a beginner to determine which is a good one to use.由于有许多不同的 ML 算法，初学者很难确定哪个是好的。 So my question is:所以我的问题是：

Which (python implemented) algorithm could i use as a start, that accepts labels for entire datasets and does not expect each value to be labeled.我可以使用哪种（python 实现）算法作为开始，它接受整个数据集的标签，并且不希望每个值都被标记。

I hope this question makes sense, edits are highly welcome as much as your time!我希望这个问题是有道理的，非常欢迎编辑和您的时间一样！ thanks!谢谢！

1 个解决方案

For this application I would go with KNN(K - nearest neighbors).对于这个应用程序，我将 go 与 KNN（K - 最近邻）。 Tech with Tim has a great tutorial on KNN, explains it well and shows the implementation. Tech with Tim 有一个很棒的关于 KNN 的教程，很好地解释了它并展示了实现。 Hope this helps希望这可以帮助