简体   繁体   中英

How to append excel to an existing dataset without losing additional columns in Foundry?

Context: Our business users receive excel sheets (.xlsx) via mail that they want to import into Foundry. We agreed on a given structure and naming convention for the files and tabs in order to simply drag and drop them into a specific folder and append them to the existing dataset. The change of this existing dataset then triggers a pipeline (raw->clean->ontology).

Issue: We use "Additional Columns" to clean up the data and apply some logic based on them (_filePath, _byteOffset, _importedAt) but every time a new excel is appended the schema seems to be reset and the "Additional Columns" are unticked.

在“编辑架构”中取消选中其他列

Is there a way of keeping the "Additional Columns" after importing and appending an excel sheet to an existing dataset?

Unfortunately, imports through the drag-and-drop interface always replace the existing schema on import which is why you are losing the additional columns. If you can create the files as CSV's instead of XLS then you can append and keep the existing schema, including the additional columns. Another approach, albeit indirect, would be to have an additional step between raw and clean that calls the metadata API to add the optional columns.

You'd want to set these textParserParam arguments:

textParserParams["addFilePath"] = True
textParserParams["addByteOffset"] = True
textParserParams["addImportedAt"] = True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM