简体繁体 English

线性回归 Model 随着用户选择和训练数据而改进

[英]Linear Regression Model that improves as the user selects and trains data

原文 2019-10-08 12:06:39 5 2 python/ scikit-learn

I'm developing a script that detects peaks on a signal data from a biological source.我正在开发一个脚本来检测来自生物源的信号数据的峰值。 I want to create a semi-automated model that helps predict which peaks are the correct ones.我想创建一个半自动化的 model 来帮助预测哪些峰值是正确的。 This script improves as the user manually selects a few of these peaks to help teach the model which ones are correct.该脚本得到改进，因为用户手动选择了其中一些峰值来帮助教导 model 哪些是正确的。

The workflow I'm trying to attain is this: 1. User manually selects data 2. Script obtains the correct data and fits it into the model 3. Use the model to predict the likelihood of a given peak to be correct.我试图达到的工作流程是这样的： 1. 用户手动选择数据 2. 脚本获取正确的数据并将其放入 model 3. 使用 model 预测给定峰值的可能性是正确的。 4. Hopefully with enough data and training, it could be automated to run through the rest. 4. 希望有足够的数据和训练，它可以自动运行通过 rest。

I also don't know the name of the general topic and I'm struggling to find what to google.我也不知道一般主题的名称，我正在努力寻找谷歌的内容。

I've tried to fit it on linear regression model in scikit learn but I don't have enough datasets (as it learns from the user's first intervention).我尝试将其拟合到 scikit learn 中的线性回归 model 中，但我没有足够的数据集（因为它从用户的第一次干预中学习）。 Is what I'm doing possible?我正在做的事情可能吗？

2 个解决方案

Sorry for the general-ness of this answer but the OP asked for general topics.很抱歉这个答案的笼统性，但 OP 要求提供一般性主题。

It sounds like semi-supervised learning and here for scikit-learn and here for more details may work.这听起来像是半监督学习，这里是 scikit-learn和这里更多细节可能有用。

There is no labeled data to start.没有标记数据可以开始。 A manual process is started to gain some labeled data.开始手动过程以获取一些标记数据。 Soon, semi-supervised can kick in and take over - with a process measuring its accuracy.很快，半监督就可以开始并接管——通过一个过程来衡量其准确性。 A match to your situation and a good place to start.适合您的情况，是一个很好的起点。

Eventually you may have "enough" correctly labeled data that you can investigate fitting a classic algorithm to predict the remainder.最终，您可能拥有“足够”正确标记的数据，您可以研究拟合经典算法来预测余数。 "Enough" being relative to how hard the problem is. “足够”与问题的难度有关。 Could be tens, hundreds, thousands, ...可能是几十，几百，几千，...

Depending on other details of your situation, Reinforcement learning may work.根据您情况的其他细节，强化学习可能会起作用。 As you described the situation, this may not work but there may be other details in your environment to leverage this family.正如您所描述的情况，这可能行不通，但您的环境中可能还有其他细节可以利用这个系列。

Word of warning - machine learning and semi-supervised in particular may not always work great to every problem.警告词——尤其是机器学习和半监督学习可能并不总是对每个问题都有效。 Measure, measure, measure.测量，测量，测量。

Thank you everyone for all your help.谢谢大家的帮助。 I was talking to a colleague and he referred me to Online Machine Learning .我正在和一位同事交谈，他将我推荐给在线机器学习。 I think this was the one I was looking for.我想这就是我要找的那个。 Although I would not be handling time-series data nor streaming data from online, the method i think is sufficient for my needs.虽然我不会处理时间序列数据或来自在线的流式数据，但我认为该方法足以满足我的需求。 This method allows that data is trained one by one and not as a batch.这种方法允许一个一个地训练数据，而不是批量训练。 I think SciKit Learn currently does not have the ability of out-of-the-box online machine learning.我认为 SciKit Learn 目前不具备开箱即用的在线机器学习能力。

This i think gives a great rundown on the strengths of online machine learning (also showcasing of the creme python library).我认为这很好地概括了在线机器学习的优势（还展示了 creme python 库）。

Thanks again!再次感谢！