I'm attempting to get hands on Kedro, but don't understand how to build my Data Fetcher (that I used before).
My Data is stored in a MongoDB instance over multiple “Tables”. One table are my usernames. First, I want to fetch them. Thereafter, based on the usernames I get, I would like to fetch Data from three “Tables” and merge them.
How should I do this best in Kedro?
Shall I put everything in a Custom Dataset? Fetch only the Usernames and do the rest in a Part of the pipeline?
So this is an interesting one - Kedro has been designed in a way that the tasks have no knowledge of the IO that is required to provide/save the data. This (for good reasons) requires you to cross this boundary.
My recommendation is to go down the custom dataset, but potentially go a little further and make it return the 3 tables you need directly. Ie do the username filter logic in this stage as well.
It also perfectly fine to raise a NotImplementedError
on save()
if you're not going do that.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.