简体   繁体   English

使用bash或awk显示两个csv文件的数据差异

[英]Show datailed differences for two csv files with bash or awk

I need your advice about a situation I have comparing two cvs files in bash: 对于在bash中比较两个cvs文件的情况,我需要您的建议:

file1.csv file1.csv

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000043.000|15|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|31583000|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000210.000|14|0|49300|1|43|4
300000493|300000323|300000323|300000000|16|0|12619|0|0|+000000000000014.000|16|0|49300|89|42|4
300146897|300146897|300000394|300000000|609|1|12619|0|0|+000000000000020.000|1|0|14689700|7|36|4

file2.csv file2.csv

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000053.000|1|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|49300|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000219.000|14|0|49300|1|43|5

The diff -y file1.csv file2.csv command shows a similar output I'm looking for: diff -y file1.csv file2.csv命令显示了我正在寻找的类似输出:

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000043.000|15|0|49300|1|42|4       |    300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000053.000|1|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|31583000|89|43|4   |    300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|49300|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000210.000|14|0|49300|1|43|4      |    300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000219.000|14|0|49300|1|43|5
300000493|300000323|300000323|300000000|16|0|12619|0|0|+000000000000014.000|16|0|49300|89|42|4     <
300146897|300146897|300000394|300000000|609|1|12619|0|0|+000000000000020.000|1|0|14689700|7|36|4   <

However I'm trying to get a more advanced output identifying with an asterik * the differences between cells and if a whole row does not exists in one of the sides, then put a dash - . 但是,我试图获得一个更高级的输出,用星号标识*单元格之间的差异,并且如果两侧中的某一行不存在整行,请在前面加上破折号- And finally create one output file per side (because after that I'm going to convert each output csv to html in order to embbed them in a html file), something like: 最后,每侧创建一个输出文件(因为在此之后,我将每个输出csv转换为html以便将它们嵌入到html文件中),如下所示:

file1.out.csv file1.out.csv

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000043.000*|15|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|31583000*|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000210.000*|14|0|49300|1|43|4*
300000493|300000323|300000323|300000000|16|0|12619|0|0|+000000000000014.000|16|0|49300|89|42|4
300146897|300146897|300000394|300000000|609|1|12619|0|0|+000000000000020.000|1|0|14689700|7|36|4

file2.out.csv file2.out.csv

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000053.000*|1|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|49300*|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000219.000*|14|0|49300|1|43|5*
-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-

Hopefully you can help me here. 希望你能在这里帮助我。 Thanks! 谢谢!

I think a possible solution will be use: 我认为可能使用的解决方案是:

paste -d '\n' file1.csv file2.csv > pasted.csv

And then read the output file to generate I need 然后读取输出文件以生成我需要的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM