简体   繁体   English

如果我们有 airflow 用于编排,为什么我们需要 TFX

[英]Why we need TFX if we have airflow for orchestration

I still don't get why we need TFX.我仍然不明白为什么我们需要 TFX。 TFX will convert your defined pipeline to Airflow DAG and run it on airflow, I could just write my pipelines in python and use Airflow's PythonOperator to build a pipeline directly right? TFX 会将您定义的管道转换为 Airflow DAG 并在 airflow 上运行它,我可以在 python 中编写我的管道并直接使用 Airflow 的 PythonOperator 来构建管道吗? why bother learning another wrapper on top of it?为什么还要在上面学习另一个包装器? What else TFX offers that cannot be done by just using airflow+TF+Spark/Beam TFX 还提供了哪些仅使用气流+TF+Spark/Beam 无法完成的功能

I could just write my pipelines in python and use Airflow's PythonOperator to build a pipeline directly right?我可以在 python 中编写我的管道并使用 Airflow 的 PythonOperator 直接构建管道,对吗?

You can!你可以! Depending on how you define a pipeline of course.当然,这取决于您如何定义管道

Here is the definition of TFX, from it's guide :这是 TFX 的定义,来自它的指南

" TFX is a Google-production-scale machine learning (ML) platform based on TensorFlow. It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system. " TFX 是一个基于 TensorFlow 的 Google 生产规模机器学习 (ML) 平台。它提供了一个配置框架和共享库来集成定义、启动和监控机器学习系统所需的通用组件。

And to make a Production ML System并制作一个生产机器学习系统

在此处输入图像描述

according to engineers at Tensorflow .根据Tensorflow 工程师的说法

So, if you can define a whole system where you are able cover all these steps in Airflow DAG's, sure you don't need TFX.因此,如果您可以定义一个能够涵盖 Airflow DAG 中所有这些步骤的整个系统,那么您肯定不需要 TFX。


PS : PS:

It comes down to the problem you are trying to solve.它归结为您要解决的问题 Here are some questions to think about.这里有一些问题需要思考。

  • Do you have the data needed at hand, is it valuable?你手头有需要的数据吗,有价值吗?

  • Do you need to adjust it before giving it to a model?在将其提供给 model 之前是否需要对其进行调整?

  • Which model should you use?您应该使用哪个 model?

  • Are you going to re-train the model as you get new data?您是否要在获得新数据时重新训练 model? If so what is the period of this process should be?如果是这样,这个过程的周期应该是多少?

  • As you are doing inference - or serving your model - how are you going to use the predicted results?当您进行推理或为您的 model 提供服务时,您将如何使用预测结果?

  • What is your threshold for evaluating the success of your service?您评估服务成功的门槛是多少? What metrics should you use?您应该使用哪些指标?

To learn more, you can check here .要了解更多信息,您可以在此处查看

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM