简体繁体中英

can I define data filters with intake catalogs?

原文 2022-04-28 21:44:25 8 1 python/ intake

I would like to use intake to not only link to published datasets, but filter them in the catalog itself. Filtering is trivial to in python once you open the data, but this means providing the user code beyond the metadata in order to give some guidance.

Motivation: often the user is not as familiar with the dataset as the producer, and it would be nice to do some preprocessing for them without adding a series of different filtering steps in python.

eg if we have opened a csv already, we can filter with: df[df['rain'] > 70] but I don't see any arguments in read_csv for either pandas or dask to do this.

1 answers

There is, indeed, no way to pass a filter to pandas' or dask's read_csv functions, and therefore this is nt an option supported by Intake's CSV driver.

However, Intake does support dataset transforms: https://intake.readthedocs.io/en/latest/transforms.html This means, that you can operate on the output of one data source, and assign a new catalogue entry to the result. The transform/computation would be performed on every access, the filtered dataset is not stored anywhere (unless you also use the persist functionality).

partitioning intake data sources

How can i use intake library to create one catalog.yaml file to reference nested directories?

vscode-python - Can I use the data viewer's filters to apply 'greater-than' and 'less-than' filters simultaneously?

How do I define custom filters for minimal use of django templates?

using numerical filters to define what print list is used (I assume if filters)

How can I define a data structure with dependency on something not defined yet?

How can I define algebraic data types in Python?

How can I create one data frame if I have multiple filters?

How can I run multiple filters in pandas?

How do I efficiently crossmatch two ASCII catalogs?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question partitioning intake data sources How can i use intake library to create one catalog.yaml file to reference nested directories? vscode-python - Can I use the data viewer's filters to apply 'greater-than' and 'less-than' filters simultaneously? How do I define custom filters for minimal use of django templates? using numerical filters to define what print list is used (I assume if filters) How can I define a data structure with dependency on something not defined yet? How can I define algebraic data types in Python? How can I create one data frame if I have multiple filters? How can I run multiple filters in pandas? How do I efficiently crossmatch two ASCII catalogs?

Related Tags

can I define data filters with intake catalogs?

Question

1 answers

solution1 0 ACCPTED 2022-04-29 14:00:01

solution1
0 ACCPTED 2022-04-29 14:00:01