简体   繁体   English

python 监督学习与数据集分类

[英]python supervised learning with data set classification

I am new to deep learning and am currently researching a certain topic.我是深度学习的新手,目前正在研究某个主题。 I am looking for machine learning detection of anomalies in time series pattern and their implementation in python.我正在寻找时间序列模式中异常的机器学习检测及其在 python 中的实现。

For example, I have a recording of the different CPU frequencies of my computer during a certain time interval.例如,我在某个时间间隔内记录了我的计算机的不同 CPU 频率。 I would like to implement a supervised learning algorithm that takes a time series of CPU frequency as an input and decides, whether anything "unusual" happened during that time (unusual CPU usage etc).我想实现一个监督学习算法,它以 CPU 频率的时间序列作为输入,并决定在那段时间是否发生任何“不寻常”的事情(不寻常的 CPU 使用率等)。

EDIT:编辑:

My data sets look the following way, every 10 seconds the current CPU frequency is measured.我的数据集如下所示,每 10 秒测量一次当前 CPU 频率。 I have not specified an exact number of datapoints per set, the following is just for illustration.我没有指定每组数据点的确切数量,以下仅用于说明。 But I am expecting around 2500 datapoints per set:但我预计每组大约有 2500 个数据点:

Dataset_1: {1.2, 1.2, 1.6, 1.3, 1.5, 1.7, 1.6, 1.4, 1.5} -> Label: "good"数据集_1:{1.2、1.2、1.6、1.3、1.5、1.7、1.6、1.4、1.5} -> Label:“好”

Dataset_2: {1.3, 1.2, 1.4, 1.3, 1.4, 1.5, 1.9, 2.1, 2.0} -> Label: "good"数据集_2:{1.3、1.2、1.4、1.3、1.4、1.5、1.9、2.1、2.0} -> Label:“好”

Dataset_n: {1.3, 1.2, 3.6, 3.5, 1.4, 1.5, 3.3, 3.2, 1.2} -> Label: "bad"数据集_n:{1.3、1.2、3.6、3.5、1.4、1.5、3.3、3.2、1.2} -> Label:“坏”

My understanding of a supervised ML algorithm is that i have training datasets.我对监督机器学习算法的理解是我有训练数据集。 However, every tutorial that i have found so far always labels each value in a data set.但是,到目前为止,我发现的每个教程总是标记数据集中的每个值。 In my case that would not be possible, as I could only tell my ML algorithm:在我的情况下这是不可能的,因为我只能告诉我的 ML 算法:

a) this time series data set is normal a) 这个时间序列数据集是正常的

b) in this data set something is not normal b) 在这个数据集中有些东西是不正常的

but i wouldn't be able to label each individual value, meaning i cannot say:但我不能 label 每个单独的值,这意味着我不能说:

1.2 -> good 1.2 -> 好

1.3 -> bad 1.3 -> 不好

1.4 -> good 1.4 -> 好

As there are many different ML algorithm, it is hard for a beginner to determine which is a good one to use.由于有许多不同的 ML 算法,初学者很难确定哪个是好的。 So my question is:所以我的问题是:

Which (python implemented) algorithm could i use as a start, that accepts labels for entire datasets and does not expect each value to be labeled.我可以使用哪种(python 实现)算法作为开始,它接受整个数据集的标签,并且不希望每个值都被标记。

I hope this question makes sense, edits are highly welcome as much as your time!我希望这个问题是有道理的,非常欢迎编辑和您的时间一样! thanks!谢谢!

For this application I would go with KNN(K - nearest neighbors).对于这个应用程序,我将 go 与 KNN(K - 最近邻)。 Tech with Tim has a great tutorial on KNN, explains it well and shows the implementation. Tech with Tim 有一个很棒的关于 KNN 的教程,很好地解释了它并展示了实现。 Hope this helps希望这可以帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM