[英]How to get muliple files as apache beam input?
Am working on this scenario: In Google Cloud Storage my files are store in this structure:我正在处理这种情况:在 Google Cloud Storage 中,我的文件存储在这种结构中:
PS*: the 2 files are in the same folder (it was an indent mistake) PS*:这两个文件在同一个文件夹中(这是一个缩进错误)
what i want to do is:我想做的是:
1] read the 2 files "client_info.csv" + "client_events.csv" from each day 1]每天读取2个文件“client_info.csv”+“client_events.csv”
2] join columns based on a common column inside each file to get 1 pcollection 2]基于每个文件内的公共列连接列以获得1个pcollection
3] doing transformations 3] 进行转换
4] load data to bigquery 4]将数据加载到bigquery
I wrote a code that read only from 1 date and it works well, But i couldn't solve the part of iteration over all dates我写了一个只能从 1 个日期读取的代码,它运行良好,但我无法解决所有日期的迭代部分
if you have any suggestion, please provide it.如果您有任何建议,请提供。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.