[英]AWK: Compare two CSV files
我有两个CSV文件,我想使用AWK比较它们并生成一个新文件。
file1.csv:
"no","loc"
"abc121","C:/pro/in"
"abc122","C:/pro/abc"
"abc123","C:/pro/xyz"
"abc124","C:/pro/in"
file2.csv:
"no","loc"
"abc121","C:/pro/in"
"abc122","C:/pro/abc"
"abc125","C:/pro/xyz"
"abc126","C:/pro/in"
output.csv:
"file1","file2","Diff"
"abc121","abc121","Match"
"abc122","abc122","Match"
"abc123","","Unmatch"
"abc124","","Unmatch"
"","abc125","Unmatch"
"","abc126","Unmatch"
我并没有单独使用awk
,但是如果我了解您正确要求的要点,那么我认为这条长线可以做到这一点...
join -t, -a 1 -a 2 -o 1.1 2.1 1.2 2.2 file1.csv file2.csv | awk -F, '{ if ( $3 == $4 ) var = "\"Match\""; else var = "\"Unmatch\"" ; print $1","$2","var }' | sed -e '1d' -e 's/^,/"",/' -e 's/,$/,"" /' -e 's/,,/,"",/g'
描述:
join
部分接收两个CSV文件,将它们连接到第一列( join
默认行为),并输出所有四个字段( -o 1.1 2.1 1.2 2.2
),确保包括两个文件都不匹配的行( -a 1 -a 2
)。 awk
部分获取该输出,并根据第3列和第4列的组合实际上是否匹配将其替换为"Match"
或"Unmatch"
"Match"
。 我必须根据您的示例对此行为做出假设。 sed
部分从输出( -e '1d'
)中删除“ no”,“ loc”标头,并用开/关引号( -e 's/^,/"",/' -e 's/,$/,""/' -e 's/,,/,"",/g'
)。 这最后一部分可能对您来说不是必需的。 编辑:正如三位一体指出的,如果两个初始文件未排序,则以上操作将失败。 这是修复此问题的更新命令。 在将文件传递给联接之前,它会撑起标题行并对每个文件进行排序。
join -t, -a 1 -a 2 -o 1.1 2.1 1.2 2.2 <( sed 1d file1.csv | sort ) <( sed 1d file2.csv | sort ) | awk -F, '{ if ( $3 == $4 ) var = "\"Match\""; else var = "\"Unmatch\"" ; print $1","$2","var }' | sed -e 's/^,/"",/' -e 's/,$/,""/' -e 's/,,/,"",/g'
awk
一种方法:
BEGIN {
FS = ","
}
NR>1 && NR==FNR {
a[$1] = $2
next
}
FNR>1 {
print ($1 in a) ? $1 FS $1 FS "Match" : "\"\"" FS $1 FS "Unmatch"
delete a[$1]
}
END {
for (x in a) {
print x FS "\"\"" FS "Unmatch"
}
}
$ awk -f script.awk file1.csv file2.csv
"abc121","abc121",Match
"abc122","abc122",Match
"","abc125",Unmatch
"","abc126",Unmatch
"abc124","",Unmatch
"abc123","",Unmatch
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.