简体   繁体   English

如何在bash中将空格分隔的键值数据转换为CSV格式?

[英]How to convert space separated key value data into CSV format in bash?

I am working on some data files where data is of key and value pairs that are separated by space. 我正在处理一些数据文件,其中的数据是键和值对之间的空格。

The data in files is inconsistent. 文件中的数据不一致。 All the Key and values are not always present.But the keys will always be as Table, count and size. 并非所有键和值都始终存在,但是键将始终为表,计数和大小。

Below example has table_name, count, size information 下面的示例具有table_name,count,大小信息

cat sample1.txt
Table SCOTT.TABLE1 count 3889 size 300
Table SCOTT.TABLE2 count 7744
Table SCOTT.TABLE3 count 2622
Table SCOTT.TABLE4 size 2773 count 22
Table SCOTT.TABLE5 size 21

Below file have just table_name but no count and size data. 文件下面只有table_name,但没有计数和大小数据。

cat sample2.txt
Table SCOTT.TABLE1
Table SCOTT.TABLE2
Table SCOTT.TABLE3
Table SCOTT.TABLE4
Table SCOTT.TABLE5

So I am trying to convert these files into CSV format using following 所以我试图使用以下将这些文件转换为CSV格式

cat <file_name> | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS=","  } NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'

cat sample1.txt | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS=","  }
NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }
{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'

Table,Count,Size
SCOTT.TABLE1,3889,300
,,
,,
,,

And for the second sample 对于第二个样本

cat sample2.txt | awk -F' ' 'BEGIN { RS="\n"; print"Table,Count,Size";OFS=","  } NR > 1 { print a["Table"], a["count"], a["size"]; delete a; next }{ a[$1]=$2 }{ a[$3]=$4 }{ a[$5]=$6 }'
Table,Count,Size
SCOTT.TABLE1,,
,,
,,
,,

But exepected as following: 但预期如下:

For sample1.txt 对于sample1.txt

TABLE,count,size
SCOTT.TABLE1,3889,300
SCOTT.TABLE2,7744,
SCOTT.TABLE3,2622
SCOTT.TABLE4,22,2773
SCOTT.TABLE5,,21

For sample2.txt 对于sample2.txt

Table,Count,Size
SCOTT.TABLE1,,
SCOTT.TABLE2,,
SCOTT.TABLE3,,
SCOTT.TABLE4,,
SCOTT.TABLE5,,

Thanks in advance. 提前致谢。

here is an inelegant but fast and comprehensible solution: 这是一个优雅但快速且易于理解的解决方案:

awk 'BEGIN{OFS=",";print "TABLE,count,size"}
  {
    t=$2
    if($3=="count"){
      c=$4
      s=$6
    }
    else{
      s=$4
      c=$6
    }
    print t,c,s
  }' 1.txt

output: 输出:

TABLE,count,size
SCOTT.TABLE1,3889,300
SCOTT.TABLE2,7744,
SCOTT.TABLE3,2622,
SCOTT.TABLE4,22,2773
SCOTT.TABLE5,,21

awk to the rescue! awk解救!

$ awk -v OFS=',' '{for(i=1;i<NF;i+=2) 
                     {if(!($i in c)){c[$i];cols[++k]=$i};
                      v[NR,$i]=$(i+1)}} 
               END{for(i=1;i<=k;i++) printf "%s", cols[i] OFS; 
                   print ""; 
                   for(i=1;i<=NR;i++) 
                     {for(j=1;j<=k;j++) printf "%s", v[i,cols[j]] OFS;
                      print ""}}' file

Table,count,size,
SCOTT.TABLE1,3889,300,
SCOTT.TABLE2,7744,,
SCOTT.TABLE3,2622,,
SCOTT.TABLE4,22,2773,
SCOTT.TABLE5,,21,

if you have gawk you can simplify it more with sorted-in 如果您有gawk ,可以通过排序来简化它

UPDATE For the revised question, the header needs to be known in advance since the keys might be completely missing. 更新对于修订后的问题,因为可能会完全丢失密钥,所以需要提前知道标头。 This simplifies the problem and the following script should do the trick. 这简化了问题,下面的脚本可以解决问题。

$ awk -v header='Table,count,size' \
      'BEGIN{OFS=","; n=split(header,h,OFS); print header} 
            {for(i=1; i<NF; i+=2) v[NR,$i]=$(i+1)} 
         END{for(i=1; i<=NR; i++) 
               {printf "%s", v[i,h[1]]; 
                for(j=2; j<=n; j++) printf "%s", OFS v[i,h[j]]; 
                print ""}}' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM