使用 awk 将 csv 拆分为多个文件 header

Question

I am attempting to split a csv file based on unique column values into multiple files using awk. I am able to split the csv successfully with awk -F\, '{print > $2".csv"}' example.csv however it is committing the header column from the new files.我正在尝试使用 awk 将基于唯一列值的 csv 文件拆分为多个文件。我能够使用awk -F\, '{print > $2".csv"}' example.csv成功拆分 csv 但是它是从新文件中提交 header 列。

For example:例如：

example.csv

Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
Green,Ford,Explorer
Black,Ford,F-150

Result:结果：

Kia.csv

Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv

Green,Ford,Explorer
Black,Ford,F-150

My desired output:我想要的 output：

Kia.csv

Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv

Color,Car,Make
Green,Ford,Explorer
Black,Ford,F-150

To attempt getting the header column passed to the new files, I attempted something like this awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv为了尝试将 header 列传递给新文件，我尝试了类似这样awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv but unfortunately this did not have the intended result. awk -F'|' 'FNR==1{hdr=$0;next} {if (;seen[$1]++) print hdr>$2. print>$2}' example.csv但不幸的是，这没有达到预期的结果。

Answer 1

You are almost there.你快到了。 Would you please try:你能试试吗：

awk -F, '
    FNR==1 {header = $0; next}
    !seen[$2]++ {print header > $2".csv"}
    {print > $2".csv"}
' example.csv

If you have many varieties of car makes, "too many open files error" may occur.如果您的汽车品牌种类繁多，可能会出现“打开文件过多错误”。 In such cases, please close files referring @RavinderSingh13's answer.在这种情况下，请关闭引用@RavinderSingh13 答案的文件。

Answer 2

1st solution: With your shown samples, please try following awk code.第一个解决方案：使用您显示的示例，请尝试使用以下awk代码。

awk -F, '
FNR==NR{
  header=$0
  next
}
{
  outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
  close(prev)
  print header > (outputFile)
}
{
  print $0 > (outputFile)
  prev=outputFile
}
' <(head -1 Input_file) <(tail -n +2 Input_file | sort -t, -k2)

2nd solution: Adding solution with only 1 pass of reading Input_file.第二种解决方案：仅通过读取 Input_file 的 1 遍添加解决方案。

awk -F, -v header=$(head -1 Input_file) '
{
  outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
  close(prev)
  print header > (outputFile)
}
{
  print $0 > (outputFile)
  prev=outputFile
}
' <(tail -n +2 Input_file | sort -t, -k2)

使用 awk 将 csv 拆分为多个文件 header

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-03-17 03:07:24

解决方案2
2 2022-03-17 04:31:31

使用 awk 将 csv 拆分为多个文件 header

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-03-17 03:07:24

解决方案2 2 2022-03-17 04:31:31

解决方案1
2 已采纳 2022-03-17 03:07:24

解决方案2
2 2022-03-17 04:31:31