简体   繁体   English

使用 awk 将 csv 拆分为多个文件 header

[英]Splitting csv into multiple files with header using awk

I am attempting to split a csv file based on unique column values into multiple files using awk. I am able to split the csv successfully with awk -F\, '{print > $2".csv"}' example.csv however it is committing the header column from the new files.我正在尝试使用 awk 将基于唯一列值的 csv 文件拆分为多个文件。我能够使用awk -F\, '{print > $2".csv"}' example.csv成功拆分 csv 但是它是从新文件中提交 header 列。

For example:例如:

example.csv

Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
Green,Ford,Explorer
Black,Ford,F-150

Result:结果:

Kia.csv

Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv

Green,Ford,Explorer
Black,Ford,F-150

My desired output:我想要的 output:

Kia.csv

Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv

Color,Car,Make
Green,Ford,Explorer
Black,Ford,F-150

To attempt getting the header column passed to the new files, I attempted something like this awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv为了尝试将 header 列传递给新文件,我尝试了类似这样awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv but unfortunately this did not have the intended result. awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv但不幸的是,这没有达到预期的结果。

You are almost there.你快到了。 Would you please try:你能试试吗:

awk -F, '
    FNR==1 {header = $0; next}
    !seen[$2]++ {print header > $2".csv"}
    {print > $2".csv"}
' example.csv

If you have many varieties of car makes, "too many open files error" may occur.如果您的汽车品牌种类繁多,可能会出现“打开文件过多错误”。 In such cases, please close files referring @RavinderSingh13's answer.在这种情况下,请关闭引用@RavinderSingh13 答案的文件。

1st solution: With your shown samples, please try following awk code.第一个解决方案:使用您显示的示例,请尝试使用以下awk代码。

awk -F, '
FNR==NR{
  header=$0
  next
}
{
  outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
  close(prev)
  print header > (outputFile)
}
{
  print $0 > (outputFile)
  prev=outputFile
}
' <(head -1 Input_file) <(tail -n +2 Input_file | sort -t, -k2)


2nd solution: Adding solution with only 1 pass of reading Input_file.第二种解决方案:仅通过读取 Input_file 的 1 遍添加解决方案。

awk -F, -v header=$(head -1 Input_file) '
{
  outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
  close(prev)
  print header > (outputFile)
}
{
  print $0 > (outputFile)
  prev=outputFile
}
' <(tail -n +2 Input_file | sort -t, -k2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM