Am working on this scenario: In Google Cloud Storage my files are store in this structure:
PS*: the 2 files are in the same folder (it was an indent mistake)
what i want to do is:
1] read the 2 files "client_info.csv" + "client_events.csv" from each day
2] join columns based on a common column inside each file to get 1 pcollection
3] doing transformations
4] load data to bigquery
I wrote a code that read only from 1 date and it works well, But i couldn't solve the part of iteration over all dates
if you have any suggestion, please provide it.
A solution may be to consider a pipeline that merges two branches. In each branch you consider one input file separately and then you join them.
Please check out the illustration and the sample code available here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.