[英]How does real world machine learning production systems run?
Dear Machine Learning/AI Community, 亲爱的机器学习/人工智能社区,
I am just a budding and aspiring Machine Learner who has worked on open online data sets and some POC's built locally for my project. 我只是一个崭露头角和有抱负的机器学习者,他致力于开放的在线数据集和一些为我的项目在本地构建的POC。 I have built some models and converted into pickle objects in order to avoid re-training.
我已经建立了一些模型并转换为泡菜对象,以避免重新训练。
And this question always puzzles me. 这个问题总是让我感到困惑。 How does a real production system work for ML algorithms?
实际生产系统如何用于ML算法?
Say, I have trained my ML algorithm with some millions of data and I want to move it to production system or host it on a server. 说,我已经用数百万的数据训练了我的ML算法,并希望将其移至生产系统或将其托管在服务器上。 In real world, do they convert into pickle objects?
在现实世界中,它们会转换为泡菜对象吗? If so, it would be huge pickled file, isn't.
如果是这样,那将是一个巨大的腌制文件,不是。 The ones I trained locally and converted for 50000 rows data itself took 300 Mb space on disk for that pickled object.
我在本地训练并转换为50000行数据的磁盘本身为该腌制对象占用了300 Mb磁盘空间。 I don't think so this is right approach.
我不认为这是正确的方法。
So how does it work in order to avoid my ML algorithm to re-train and start predicting on incoming data? 那么,如何避免我的ML算法重新训练并开始对传入数据进行预测呢? And how do we actually make ML algorithm as a continuous online learner.
以及我们如何真正使ML算法成为一个连续的在线学习者。 For example, I built a image classifier, and start predicting the incoming images.
例如,我建立了一个图像分类器,并开始预测传入的图像。 But I want to again train algorithm by adding the incoming online images to my previously trained data sets.
但是我想通过将传入的在线图像添加到我以前训练过的数据集中来再次训练算法。 May be not for every data, but daily once I want to combine all received data for that day and re-train with newly 100 images which my previously trained classifier predicted with actual value.
可能不是每个数据,而是每天一次,我想将当天收到的所有数据合并起来,并用我以前训练过的分类器预测的具有实际价值的新100张图像进行重新训练。 And this approach shouldn't effect my previously trained algorithm to stop predicting incoming data as this re-training may take time based on computational resources and data.
而且这种方法不应影响我以前训练的算法来停止预测输入数据,因为这种重新训练可能会基于计算资源和数据花费时间。
I have Googled and read many articles, but couldn't find or understand to my above question. 我已经在Google上搜索并阅读了许多文章,但找不到或无法理解我的上述问题。 And this is puzzling me every day.
这每天困扰着我。 Do manual intervention is needed for production systems as well?
生产系统也需要人工干预吗? or any automated approach is there for it?
或有任何自动化的方法吗?
Any leads or answers to above questions would be highly helpful and appreciated. 以上问题的任何线索或答案都将非常有帮助和赞赏。 Please let me know if my questions doesn't make sense or not understandable.
如果我的问题没有道理或无法理解,请告诉我。
This is not a project centric I am looking for. 我不是要以项目为中心。 Just a generic case of real world production ML systems example.
这只是现实世界中生产ML系统示例的一般情况。
Thank you in advance! 先感谢您!
Note that this is is very broadly formulated, and your question should be put on hold probably, but I try to give a brief summary of what you are trying to ask: 请注意,这是非常广泛的表述方式,您的问题可能应该搁置,但我尝试简要概述您要提出的问题:
For larger scale systems, it also becomes important to scale your models, to perform the desired amount of predictions/classifications per second. 对于较大规模的系统,缩放模型,执行每秒所需的预测/分类数量也很重要。 This is also mentioned in the link to the TensorFlow deployment page I provided, and mainly builds on top of cloud/distributed architectures, such as Hadoop or (more recently) Kubernetes.
我提供的TensorFlow部署页面的链接中也提到了这一点,该页面主要基于云/分布式架构(例如Hadoop或(最近)Kubernetes)构建。 Then again, for smaller products this is mostly overkill, but serves the purpose of delivering enough resources at any arbitrary scale (and possibly on demand).
再者,对于较小的产品,这通常是过大的,但其目的是以任意规模(并可能按需)提供足够的资源。
As for the integration cycle of machine learning models, there is a nice overview in this article . 至于机器学习模型的集成周期,有一个很好的概述这篇文章 。 I want to conclue by stressing that this is a heavily opinionated question, so every answer might be different!
我想通过强调这是一个很自以为是的问题来总结一下,因此每个答案都可能不同!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.