[英]Compare 2 Unix Files and Output Matching Lines to a New File?
I have 2 nix files. 我有2个nix文件。 All of the data is on one single line in each file. 所有数据都在每个文件的一行中。 Each value is separated by a null character. 每个值由空字符分隔。 Some off the values in the data match. 一些关闭数据中的值匹配。
How would I parse this data into a new file listing only the matching values ? 如何将此数据解析为仅列出匹配值的新文件?
I figure I could use sed to change the null characters into newlines ? 我想我可以使用sed将空字符更改为换行符? From there on I'm not real sure... 从那以后,我不确定......
Any ideas ? 有任何想法吗 ?
Use tr
, sort
and comm
: 使用tr
, sort
和comm
:
Convert nulls into new lines, and sort the result: 将空值转换为新行,并对结果进行排序:
$ tr '\000' '\n' < file1 | sort > file1.txt
$ tr '\000' '\n' < file2 | sort > file2.txt
then use comm
to get the lines that are common to both file: 然后使用comm
来获取两个文件共有的行:
$ comm -1 -2 file1.txt file2.txt
<lines shown here are the common lines between file1.txt and file2.txt>
If there are no duplicate values within file1 or file2, you can do this: 如果file1或file2中没有重复值,则可以执行以下操作:
( tr '\0' '\n' < file1; tr '\0' '\n' < file2 ) | sort | uniq -c | egrep -v '^ +1'
This will count all of the duplicate values between the two files. 这将计算两个文件之间的所有重复值。
If the order of the fields is important, you can do this: 如果字段的顺序很重要,您可以这样做:
comm -1 -2 <(tr '\0' '\n' < file1) <(tr '\0' '\n' < file2)
This approach is not portable, it requires the 'process substitution' feature of Bash. 这种方法不可移植,它需要Bash的“进程替换”功能。
这可能对你有用:
parallel 'tr "\000" "\n" <{} | sort -u' ::: file{1,2} | sort | uniq -d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.