[英]Splitting csv into multiple files with header using awk
I am attempting to split a csv file based on unique column values into multiple files using awk. I am able to split the csv successfully with awk -F\, '{print > $2".csv"}' example.csv
however it is committing the header column from the new files.我正在尝试使用 awk 将基于唯一列值的 csv 文件拆分为多个文件。我能够使用awk -F\, '{print > $2".csv"}' example.csv
成功拆分 csv 但是它是从新文件中提交 header 列。
For example:例如:
example.csv
Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
Green,Ford,Explorer
Black,Ford,F-150
Result:结果:
Kia.csv
Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv
Green,Ford,Explorer
Black,Ford,F-150
My desired output:我想要的 output:
Kia.csv
Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv
Color,Car,Make
Green,Ford,Explorer
Black,Ford,F-150
To attempt getting the header column passed to the new files, I attempted something like this awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv
为了尝试将 header 列传递给新文件,我尝试了类似这样awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv
awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv
awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv
but unfortunately this did not have the intended result. awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv
但不幸的是,这没有达到预期的结果。
You are almost there.你快到了。 Would you please try:你能试试吗:
awk -F, '
FNR==1 {header = $0; next}
!seen[$2]++ {print header > $2".csv"}
{print > $2".csv"}
' example.csv
If you have many varieties of car makes, "too many open files error" may occur.如果您的汽车品牌种类繁多,可能会出现“打开文件过多错误”。 In such cases, please close files referring @RavinderSingh13's answer.在这种情况下,请关闭引用@RavinderSingh13 答案的文件。
1st solution: With your shown samples, please try following awk
code.第一个解决方案:使用您显示的示例,请尝试使用以下awk
代码。
awk -F, '
FNR==NR{
header=$0
next
}
{
outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
close(prev)
print header > (outputFile)
}
{
print $0 > (outputFile)
prev=outputFile
}
' <(head -1 Input_file) <(tail -n +2 Input_file | sort -t, -k2)
2nd solution: Adding solution with only 1 pass of reading Input_file.第二种解决方案:仅通过读取 Input_file 的 1 遍添加解决方案。
awk -F, -v header=$(head -1 Input_file) '
{
outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
close(prev)
print header > (outputFile)
}
{
print $0 > (outputFile)
prev=outputFile
}
' <(tail -n +2 Input_file | sort -t, -k2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.