简体   繁体   English

根据条件将bash中的CSV文件拆分为多个文件

[英]Split CSV file in bash into multiple files based on condition

My csv file has multiple rows of data and I want to split it into multiple files based on one attribute.我的 csv 文件有多行数据,我想根据一个属性将其拆分为多个文件。

beeline -u jdbc:hive2:<MYHOST> -n <USER> -p <PASSWORD> --silent=true --outputformat=csv2 -f <SQL FILE> > result_+%Y%m%d_%H%M%S.csv

SQL code with ORDER BY ID is triggered from beeline which creates single CSV.带有ORDER BY ID SQL 代码是从创建单个 CSV 的直线触发的。

cat sql.csv
"attr;attr;ID;attr"
"data;data;XXXX;date"
"data;data;XXXX;date"
"data;data;YYYYY;date"
"data;data;YYYYY;date"
"data;data;BBBBB;date"
"data;data;BBBBB;date"

Desired result is to split once new ID is recognised and use that ID in filename.期望的结果是在识别出新ID进行拆分,并在文件名中使用该ID

file_1_ID_XXXX_+%Y%m%d_%H%M%S : file_1_ID_XXXX_+%Y%m%d_%H%M%S :

attr   attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date

file_2_ID_YYYYY_+%Y%m%d_%H%M%S : file_2_ID_YYYYY_+%Y%m%d_%H%M%S :

attr   attr    ID  attr
data    data    YYYYY   date
data    data    YYYYY   date

If I understand your question, you can take the csv file produced by sql and then split that into the 3 files you show simply by using a few variables, string concatenation and then by redirecting to the output files, eg如果我理解您的问题,您可以将 sql 生成的 csv 文件拆分为您显示的 3 个文件,只需使用几个变量、字符串连接然后重定向到输出文件,例如

awk -v field=a -v n=1 -v dt=$(date '+%Y%m%d_%H%M%S') '
    FNR == 1 {hdg=$0; next}
    a != $3 {a = $3; name="file_"n"_ID_"a"_"dt; n++; print hdg > name}
    {print $0 > name}
' sqldata

Example Input File示例输入文件

Where your sqldata file contains:您的sqldata文件包含:

$ cat sqldata
attr    attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date
data    data    YYYYY   date
data    data    YYYYY   date
data    data    BBBBB   date
data    data    BBBBB   date

Example Use/Output Files示例使用/输出文件

Simply copying and middle-mouse pasting awk script into the terminal with the correct filename to read would produce the following three output files:简单地将 awk 脚本复制并用鼠标中键粘贴到终端中,并使用正确的文件名读取将产生以下三个输出文件:

$ cat file_1_ID_XXXX_20190805_033514
attr    attr    ID  attr
data    data    XXXX    date
data    data    XXXX    date

$ cat file_2_ID_YYYYY_20190805_033514
attr    attr    ID  attr
data    data    YYYYY   date
data    data    YYYYY   date

$ cat file_3_ID_BBBBB_20190805_033514
attr    attr    ID  attr
data    data    BBBBB   date
data    data    BBBBB   date

Look things over and let me know if this is what you intended.仔细看看,让我知道这是否是你的意图。 If not, let me know and I'm happy to help further.如果没有,请告诉我,我很乐意为您提供进一步帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM