简体繁体 English

如何从 Kedro 获取复杂的 MongoDB 数据？

[英]How to fetch complex MongoDB Data from Kedro?

原文 2022-03-11 07:28:57 6 1 mongodb/ fetch/ kedro

I'm attempting to get hands on Kedro, but don't understand how to build my Data Fetcher (that I used before).我正在尝试接触 Kedro，但不了解如何构建我的 Data Fetcher（我以前使用过）。

My Data is stored in a MongoDB instance over multiple “Tables”.我的数据存储在多个“表”上的 MongoDB 实例中。 One table are my usernames.一张表是我的用户名。 First, I want to fetch them.首先，我想获取它们。 Thereafter, based on the usernames I get, I would like to fetch Data from three “Tables” and merge them.此后，根据我得到的用户名，我想从三个“表”中获取数据并将它们合并。

How should I do this best in Kedro?我应该如何在 Kedro 中做到最好？

Shall I put everything in a Custom Dataset?我应该把所有东西都放在自定义数据集中吗？ Fetch only the Usernames and do the rest in a Part of the pipeline?仅获取用户名并在管道的一部分中执行 rest？

1 个解决方案

So this is an interesting one - Kedro has been designed in a way that the tasks have no knowledge of the IO that is required to provide/save the data.所以这是一个有趣的 - Kedro 的设计方式是任务不知道提供/保存数据所需的 IO。 This (for good reasons) requires you to cross this boundary.这（出于充分的理由）要求您跨越此边界。

My recommendation is to go down the custom dataset, but potentially go a little further and make it return the 3 tables you need directly.我的建议是 go 向下自定义数据集，但可能 go 更远一点，并使其直接返回您需要的 3 个表。 Ie do the username filter logic in this stage as well.即在此阶段也执行用户名过滤逻辑。

It also perfectly fine to raise a NotImplementedError on save() if you're not going do that.如果您不打算这样做，也可以在save()上引发NotImplementedError 。