简体   繁体   中英

How to get muliple files as apache beam input?

Am working on this scenario: In Google Cloud Storage my files are store in this structure:

PS*: the 2 files are in the same folder (it was an indent mistake)

在此处输入图像描述

what i want to do is:

1] read the 2 files "client_info.csv" + "client_events.csv" from each day

2] join columns based on a common column inside each file to get 1 pcollection

3] doing transformations

4] load data to bigquery

I wrote a code that read only from 1 date and it works well, But i couldn't solve the part of iteration over all dates

if you have any suggestion, please provide it.

A solution may be to consider a pipeline that merges two branches. In each branch you consider one input file separately and then you join them.

Please check out the illustration and the sample code available here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM