简体繁体中英

In Foundry Code Repositories, how do I iterate over all datasets in a directory?

原文 2020-09-21 16:52:22 1 1 palantir-foundry/ foundry-code-repositories

I'm trying to read (all or multiple) datasets from single directory in single Pyspark transform. Is it possible to iterate over all the datasets in a path, without hardcoding individual datasets as input?

I'd like to dynamically fetch different columns from multiple datasets without having to hardcode individual input datasets.

1 answers

So this doesn't work since you will have inconsistent results every time you run CI. This will break TLLV (transforms level logic versioning) by making it impossible to tell when logic actually has changed, thus marking a dataset as stale.

You will have to write out the logical paths of each dataset you wish to transform, even if it means they are passed into a generated transform. There will need to be at least some consistent record of which datasets were targeted by which commit.

Another tactic to achieve what you're looking for is to make a single long dataset that is the unpivoted version of the datasets. In this way, you could simple APPEND new rows / files to this dataset, which would let you accept arbitrary inputs, assuming your transform is constructed in such a way to handle this. My rule of thumb is this: if you need dynamic schemas or dynamic counts of datasets, then you're better off using dynamic files / row counts in a single dataset.

How do I use a local IDE for Java Transforms in Foundry Code Repositories?

How do I enforce a minimum test coverage percentage in my Foundry Code Repositories?

How can I run pytesseract / tesseract in Foundry Code Repositories?

How can I hit a Foundry API from Code Repositories?

How do I union two datasets in Palantir Foundry within a code workbook?

How do I JOIN two datasets in Palantir Foundry within a code workbook?

How can i iterate over json files in code repositories and incrementally append to a dataset

Palantir Foundry incremental testing is hard to iterate on, how do I find bugs faster?

How do I ensure consistent file sizes in datasets built in Foundry Python Transforms?

How do I display a matplotlib plot in the console in Foundry Code Workbook?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How do I use a local IDE for Java Transforms in Foundry Code Repositories? How do I enforce a minimum test coverage percentage in my Foundry Code Repositories? How can I run pytesseract / tesseract in Foundry Code Repositories? How can I hit a Foundry API from Code Repositories? How do I union two datasets in Palantir Foundry within a code workbook? How do I JOIN two datasets in Palantir Foundry within a code workbook? How can i iterate over json files in code repositories and incrementally append to a dataset Palantir Foundry incremental testing is hard to iterate on, how do I find bugs faster? How do I ensure consistent file sizes in datasets built in Foundry Python Transforms? How do I display a matplotlib plot in the console in Foundry Code Workbook?

Related Tags

In Foundry Code Repositories, how do I iterate over all datasets in a directory?

Question

1 answers

solution1 0 ACCPTED 2020-09-21 17:05:48

solution1
0 ACCPTED 2020-09-21 17:05:48