简体   繁体   English

文件1到文件2中的单词不匹配

[英]Non matching word from file1 to file2

I have two files - file1 & file2. 我有两个文件-file1和file2。 file1 contains (only words) says- file1包含(仅单词)说-

ABC
YUI
GHJ
I8O

.................. .....................

file2 contains many para. file2包含许多段。

dfghjo ABC kll njjgg bla bla 
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi

................... ...................

I am using below command to get the matching lines which contains word from file1 in file2 我正在使用以下命令来获取包含file2中的file1中的单词的匹配行

 grep -Ff file1 file2
(Gives output of lines where words of file1 found in file2)

I also need the words which doesn't match/found in file 2 and unable to find Un-matching word. 我还需要在文件2中不匹配/找不到并且无法找到不匹配单词的单词。

Can anyone help in getting below output 任何人都可以帮助低于输出

YUI
I8O

i am looking one liner command (via grep,awk,sed), as i am using pssh command and can't use while,for loop 我正在寻找一个衬板命令(通过grep,awk,sed),因为我正在使用pssh命令,并且不能使用while,for循环

You can print only the matched parts with -o . 您只能使用-o打印匹配的部分。

$ grep -oFf file1 file2
ABC
GHJ

Use that output as a list of patterns for a search in file1. 将该输出用作文件1中搜索的模式列表。 Process substitution <(cmd) simulates a file containing the output of cmd . 进程替换<(cmd)模拟包含cmd输出的文件。 With -v you can print lines that did not match. 使用-v可以打印不匹配的行。 If file1 contains two lines such that one line is a substring of another line you may want to add -x (only match whole lines) to prevent false positives. 如果file1包含两行,使得一行是另一行的子字符串,则可能需要添加-x (仅匹配整行)以防止误报。

$ grep -vxFf <(grep -oFf file1 file2) file1
YUI
I8O

Using Perl - both matched/non-matched in same one-liner 使用Perl-在同一单行中匹配/不匹配

$ cat sinw.txt
ABC
YUI
GHJ
I8O

$ cat sin_in.txt
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi

$ perl -lne '
    BEGIN { %x=map{chomp;$_=>1} qx(cat sinw.txt); $w="\\b".join("\|",keys %x)."\\b"} 
    print "$&" and delete($x{$&}) if /$w/ ; 
    END { print "\nnon-matched\n".join("\n", keys %x) } 
' sin_in.txt

ABC
GHJ

non-matched
I8O
YUI

$

Getting only the non-matched 仅获取不匹配的

$ perl -lne ' 
    BEGIN { 
        %x = map { chomp; $_=>1 } qx(cat sinw.txt); 
        $w = "\\b" . join("\|",keys %x) . "\\b" 
    } 
    delete($x{$&}) if /$w/;
    END { print "\nnon-matched\n".join("\n", keys %x) } 
' sin_in.txt

non-matched
I8O
YUI

$

Note that even a single use of $& variable used to be very expensive for the whole program, in Perl versions prior to 5.20 . 请注意,在5.20之前的 Perl版本中,即使单次使用$&变量对于整个程序来说也非常昂贵。

Assuming your "words" in file1 are in more than 1 line : 假设您在file1中的“单词”多于1行:

  while read line 
  do 
    for word in $line  
    do 
       if ! grep -q $word file2
         then echo $word not found 
       fi 
    done 
  done < file1

For Un-matching words , here's one GNU awk solution: 对于不匹配的单词 ,这是一个GNU awk解决方案:

awk 'NR==FNR{a[$0];next} !($1 in a)' RS='[ \n]' file2 file1
YUI
I8O

Or !($0 in a) , it's the same. !($0 in a)一样。 Since I set RS='[ \\n]' , every space as line separator too. 由于我将RS='[ \\n]' ,所以每个空格也都作为分隔符。

And note that I read file2 first, and then file1. 并请注意,我先读取file2,然后读取file1。

If file2 could be empty, you should change NR==FNR to different file checking methods, like ARGIND==1 for GNU awk, or FILENAME=="file2" , or FILENAME==ARGV[1] etc. 如果file2为空,则应将NR==FNR更改为其他文件检查方法,例如GNU awk的ARGIND==1FILENAME=="file2"FILENAME==ARGV[1]等。

Same mechanism for only the matched one too: 相同的机制也只适用于匹配的机制:

awk 'NR==FNR{a[$0];next} $0 in a' RS='[ \n]' file2 file1
ABC
GHJ

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM