简体   繁体   English

我应该使用 Rails 来保持一致性吗? (针对 ETL 项目)

[英]Should I use Rails for consistency? (for ETL project)

CONTEXT语境

  • I'm new to Ruby and all that jazz, but I'm not new to dev.我是 Ruby 和所有爵士乐的新手,但我对开发并不陌生。
  • I'm taking over a project based on 2 rails/puma repositories for web & APIs.我正在接管一个基于 2 个用于 Web 和 API 的 rails/puma 存储库的项目。
  • I'm building a new repository for a backend data processing app, using Kiba , that will run through scheduled jobs.我正在使用Kiba为后端数据处理应用程序构建一个新的存储库,它将通过预定的作业运行。
  • Also, I'm to be joined by other devs later on, so I'd like to make something maintainable by design.此外,我稍后会加入其他开发人员,所以我想通过设计使一些可维护的东西。

MY QUESTION : Should I use Rails on that ETL project?我的问题:我应该在那个 ETL 项目上使用 Rails 吗?

Using it means we can apply the same folder structure as the other repos, use RSpec all the same etc. It also appeared to me that Rails changes the way classes like Hash act.使用它意味着我们可以应用与其他 repos 相同的文件夹结构,使用 RSpec 等。在我看来,Rails 改变了像 Hash 这样的类的行为方式。

At the same time, it seems to bring unnecessary complexity to a project that will run on CLI and could consist of only a dozen of files.同时,这似乎给将在 CLI 上运行并且可能仅包含十几个文件的项目带来了不必要的复杂性。

Kiba author here!木场作者在这里! This is an important question, thanks for asking it!这是一个重要的问题,谢谢你的提问!

MY QUESTION : Should I use Rails on that ETL project?我的问题:我应该在那个 ETL 项目上使用 Rails 吗?

By default, I would recommend to start with a separate project (like a kind of "macro-service" approach), unless you have important things (more than just RSpec & ENV setup) to reuse from the Rails app.默认情况下,我建议从一个单独的项目开始(比如一种“宏服务”方法),除非你有重要的东西(不仅仅是 RSpec 和 ENV 设置)要从 Rails 应用程序中重用。

If there is an important expected coupling between the app and the ETL (eg by "scheduled jobs" you mean jobs triggered through Sidekiq, to react to events, or you have classes shared between the 2 projects), then you can place the ETL in a etl subfolder of your Rails app, for instance, to provide a bit of separation and leave the opportunity to split the code out later if it becomes a better path (this is a middle ground I'm using on some projects).如果应用程序和 ETL 之间存在重要的预期耦合(例如,“计划作业”是指通过 Sidekiq 触发的作业,以对事件做出反应,或者您在两个项目之间共享类),那么您可以将 ETL 放在例如,您的 Rails 应用程序的etl子文件夹,以提供一些分离,并留出机会在以后将代码拆分为更好的路径(这是我在某些项目中使用的中间立场)。

If it is not the case, though, and the data pipeline is expected to become large and live its own life, you can instead split it to its own project.但是,如果情况并非如此,并且预计数据管道会变大并过自己的生活,则您可以将其拆分为自己的项目。

Using it means we can apply the same folder structure as the other repos, use RSpec all the same etc.使用它意味着我们可以应用与其他存储库相同的文件夹结构,使用 RSpec 等。

You can use RSpec or minitest from a dedicated ETL (pure Ruby) project too, introduce a notion of ETL_ENV ( development , test , production ), build your own ENV-based (or file based) configuration with dotenv or similar, and support cron jobs from there too if you need that.您也可以使用来自专用 ETL(纯 Ruby)项目的 RSpec 或 minitest,引入ETL_ENVdevelopmenttestproduction )的概念,使用 dotenv 或类似工具构建您自己的基于 ENV(或基于文件)的配置,并支持 cron如果您需要,也可以从那里找到工作。

Pure Ruby projects can be structured just like a Rails app, and there is usually less magic (more explicit), which is helpful.纯 Ruby 项目可以像 Rails 应用程序一样构建,并且通常没有什么魔力(更明确),这很有帮助。

It also appeared to me that Rails changes the way classes like Hash act.在我看来,Rails 改变了像 Hash 这样的类的行为方式。

I would actually recommend to use an "explicit" approach about depending about that.我实际上建议使用“明确”的方法来依赖它。 Today I prefer to "cherry-pick" the exact extensions I need, at the top of each file (as described here ).今天,我更喜欢“摘樱桃”我需要确切的扩展,在每个文件的顶部(如描述在这里)。

One last word, you can test out Kiba ETL pipelines just as much as your individual ETL components, and I would recommend to do so (I will cover that in a future blog post), since it helps moving things around and upgrading Ruby with ease, and generally scale the team of developers easily (CI + tests).最后一句话,您可以像测试单个 ETL 组件一样测试 Kiba ETL 管道,我建议您这样做(我将在以后的博客文章中介绍),因为它有助于轻松移动和升级 Ruby ,并且通常可以轻松扩展开发团队(CI + 测试)。

I hope this provides enough guidance for you to take a decision on this, if this is not the case, please comment out!我希望这为您做出决定提供了足够的指导,如果不是这种情况,请发表评论!

From my point of view using Rails for ETL projects is an overhead.从我的角度来看,将 Rails 用于 ETL 项目是一种开销。 Take a look at dry-rb.看看dry-rb。 Using https://dry-rb.org/gems/dry-system/0.12/ you can build a small application to process data.使用https://dry-rb.org/gems/dry-system/0.12/,您可以构建一个小型应用程序来处理数据。 Also, there is a gem to build CLI https://dry-rb.org/gems/dry-cli/0.4/此外,还有一个 gem 来构建 CLI https://dry-rb.org/gems/dry-cli/0.4/

Here is a list of all dry gems https://dry-rb.org/gems/这是所有干宝石的列表https://dry-rb.org/gems/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM