简体   繁体   English

合并GCP中的csv个文件

[英]Merge the csv files in GCP

The dataset on which I am working on GCP is in csv format and for each feature there is a separate csv file with no header. There is around 20 files and want to create a single file for all these variables with headers.我在 GCP 上工作的数据集采用 csv 格式,对于每个功能,都有一个单独的 csv 文件,没有 header。大约有 20 个文件,我想为所有这些带有标题的变量创建一个文件。 However, I have access on the data bucket only when I try to open Vertex AI Workbench, it shows I don't have permission for that.但是,只有当我尝试打开 Vertex AI Workbench 时,我才能访问数据桶,这表明我没有权限。 Is there any way to combine all these files?有什么办法可以合并所有这些文件吗?

I wrote an article on how to use BigQuery to play with CSV files.写了一篇关于如何使用 BigQuery 玩转 CSV 个文件的文章。 I didn't mentioned how to merge to right.我没有提到如何向右合并。 But typically I will do a join on row_number, something like that:但通常我会在 row_number 上做一个连接,像这样:

With table_left as (select *, ROW_NUMBER() OVER () as row_id from <tableLeft>),
table_right as (select *, ROW_NUMBER() OVER () as row_id from <tableRight>)
select tl.* except(row_id), tr.*  except(row_id) from table_left tl join table_right tr on tl.row_id = tr.row_id

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM