简体   繁体   English

如何从 Kedro 获取复杂的 MongoDB 数据?

[英]How to fetch complex MongoDB Data from Kedro?

I'm attempting to get hands on Kedro, but don't understand how to build my Data Fetcher (that I used before).我正在尝试接触 Kedro,但不了解如何构建我的 Data Fetcher(我以前使用过)。

My Data is stored in a MongoDB instance over multiple “Tables”.我的数据存储在多个“表”上的 MongoDB 实例中。 One table are my usernames.一张表是我的用户名。 First, I want to fetch them.首先,我想获取它们。 Thereafter, based on the usernames I get, I would like to fetch Data from three “Tables” and merge them.此后,根据我得到的用户名,我想从三个“表”中获取数据并将它们合并。

How should I do this best in Kedro?我应该如何在 Kedro 中做到最好?

Shall I put everything in a Custom Dataset?我应该把所有东西都放在自定义数据集中吗? Fetch only the Usernames and do the rest in a Part of the pipeline?仅获取用户名并在管道的一部分中执行 rest?

So this is an interesting one - Kedro has been designed in a way that the tasks have no knowledge of the IO that is required to provide/save the data.所以这是一个有趣的 - Kedro 的设计方式是任务不知道提供/保存数据所需的 IO。 This (for good reasons) requires you to cross this boundary.这(出于充分的理由)要求您跨越此边界。

My recommendation is to go down the custom dataset, but potentially go a little further and make it return the 3 tables you need directly.我的建议是 go 向下自定义数据集,但可能 go 更远一点,并使其直接返回您需要的 3 个表。 Ie do the username filter logic in this stage as well.即在此阶段也执行用户名过滤逻辑。

It also perfectly fine to raise a NotImplementedError on save() if you're not going do that.如果您不打算这样做,也可以在save()上引发NotImplementedError

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM