简体   繁体   中英

How to get fileName inside DoFn in apache beam while processing input file pattern

I am processing large no of files which are inside one directory. I want to add fileName in metadata of processed data output. So that if something goes wrong while processing, we can check that what is input file for the processed record.

Is there a way i can get file name inside my DoFn. I am using apache beam 2.19.0 version

Input file location - gs://bucket/extracted-files/*

You can use transforms available in FileIO class for this purpose.

Specially you can use FileIO.match() followed by FileIO.readMatches() which results in a PCollection of ReadableFile objects. For each ReadableFile you have access to a byte channel for reading as well as a Metadata object that contains the name of the file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM