[英]Non matching word from file1 to file2
I have two files - file1 & file2. 我有两个文件-file1和file2。 file1 contains (only words) says-
file1包含(仅单词)说-
ABC
YUI
GHJ
I8O
.................. .....................
file2 contains many para. file2包含许多段。
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi
................... ...................
I am using below command to get the matching lines which contains word from file1 in file2 我正在使用以下命令来获取包含file2中的file1中的单词的匹配行
grep -Ff file1 file2
(Gives output of lines where words of file1 found in file2)
I also need the words which doesn't match/found in file 2 and unable to find Un-matching word. 我还需要在文件2中不匹配/找不到并且无法找到不匹配单词的单词。
Can anyone help in getting below output 任何人都可以帮助低于输出
YUI
I8O
i am looking one liner command (via grep,awk,sed), as i am using pssh command and can't use while,for loop 我正在寻找一个衬板命令(通过grep,awk,sed),因为我正在使用pssh命令,并且不能使用while,for循环
You can print only the matched parts with -o
. 您只能使用
-o
打印匹配的部分。
$ grep -oFf file1 file2
ABC
GHJ
Use that output as a list of patterns for a search in file1. 将该输出用作文件1中搜索的模式列表。 Process substitution
<(cmd)
simulates a file containing the output of cmd
. 进程替换
<(cmd)
模拟包含cmd
输出的文件。 With -v
you can print lines that did not match. 使用
-v
可以打印不匹配的行。 If file1 contains two lines such that one line is a substring of another line you may want to add -x
(only match whole lines) to prevent false positives. 如果file1包含两行,使得一行是另一行的子字符串,则可能需要添加
-x
(仅匹配整行)以防止误报。
$ grep -vxFf <(grep -oFf file1 file2) file1
YUI
I8O
Using Perl - both matched/non-matched in same one-liner 使用Perl-在同一单行中匹配/不匹配
$ cat sinw.txt
ABC
YUI
GHJ
I8O
$ cat sin_in.txt
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi
$ perl -lne '
BEGIN { %x=map{chomp;$_=>1} qx(cat sinw.txt); $w="\\b".join("\|",keys %x)."\\b"}
print "$&" and delete($x{$&}) if /$w/ ;
END { print "\nnon-matched\n".join("\n", keys %x) }
' sin_in.txt
ABC
GHJ
non-matched
I8O
YUI
$
Getting only the non-matched 仅获取不匹配的
$ perl -lne '
BEGIN {
%x = map { chomp; $_=>1 } qx(cat sinw.txt);
$w = "\\b" . join("\|",keys %x) . "\\b"
}
delete($x{$&}) if /$w/;
END { print "\nnon-matched\n".join("\n", keys %x) }
' sin_in.txt
non-matched
I8O
YUI
$
Note that even a single use of $& variable used to be very expensive for the whole program, in Perl versions prior to 5.20 . 请注意,在5.20之前的 Perl版本中,即使单次使用$&变量对于整个程序来说也非常昂贵。
Assuming your "words" in file1 are in more than 1 line : 假设您在file1中的“单词”多于1行:
while read line
do
for word in $line
do
if ! grep -q $word file2
then echo $word not found
fi
done
done < file1
For Un-matching words , here's one GNU awk
solution: 对于不匹配的单词 ,这是一个GNU
awk
解决方案:
awk 'NR==FNR{a[$0];next} !($1 in a)' RS='[ \n]' file2 file1
YUI
I8O
Or !($0 in a)
, it's the same. 或
!($0 in a)
一样。 Since I set RS='[ \\n]'
, every space as line separator too. 由于我将
RS='[ \\n]'
,所以每个空格也都作为行分隔符。
And note that I read file2 first, and then file1. 并请注意,我先读取file2,然后读取file1。
If file2 could be empty, you should change NR==FNR
to different file checking methods, like ARGIND==1
for GNU awk, or FILENAME=="file2"
, or FILENAME==ARGV[1]
etc. 如果file2为空,则应将
NR==FNR
更改为其他文件检查方法,例如GNU awk的ARGIND==1
或FILENAME=="file2"
或FILENAME==ARGV[1]
等。
Same mechanism for only the matched one too: 相同的机制也只适用于匹配的机制:
awk 'NR==FNR{a[$0];next} $0 in a' RS='[ \n]' file2 file1
ABC
GHJ
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.