简体   繁体   English

比较AWK中的两个文件

[英]Comparing two files in AWK

I have two .txt files and I want to check if the contents of one file are present in the other or not.我有两个.txt文件,我想检查一个文件的内容是否存在于另一个文件中。 My Book1.txt contents are:我的Book1.txt内容是:

PATX248
PATX216
PATX203
PATX219B
PATX212
PATX248
PATX211
PATX190
PATX222
PATX241
B8025
B1003
B8063
B8032
C0999
C1035
B1011

My InventorySheet2finaloutput.txt is:我的InventorySheet2finaloutput.txt是:

B8061P3 366-L4/26/2017 1
PATX-148 P3 4
 1003P4 M#1N-L1/19/2017
B1011P5 330-L2/23/2017 1
B8032P3 336-L3/10/2017 1
B1011P5 329-L2/14/2017 1
PATX-60P5 279-L2/8/2017 1
PATX-70 P3 5
B1573P6 1R-R8/10/2017 1
B8025 P4 5
B8025 P5 1
 1061P3 372-R4/26/2017
 2078 P4M#1RR-R8/25/2017
C0999 P5 4
B8078 P4M#1N-R8/25/2017 2
C-1008 P4 1
PATX-55 P4 4
B1003P5 325-R3/3/2017 1
PATX-45P4 266-L2/14/2017 1
B8032P4 384-R4/26/2017 1
C-1035 P3 1
B8032P3 340-R3/17/2017 1

Output:输出:

B1003P5 325-R3/3/2017 1
B8032P3 336-L3/10/2017 1
B8032P4 384-R4/26/2017 1
B8032P3 340-R3/17/2017 1
C0999 P5 4
C-1035 P3 1
B1011P5 330-L2/23/2017 1
B1011P5 329-L2/14/2017 1

I have used all the solutions I could search on google, they all are getting executed but no result is being printed.我已经使用了我可以在谷歌上搜索的所有解决方案,它们都在执行,但没有打印任何结果。 The solutions that I tried are:我尝试的解决方案是:

  1. grep -v -F -x -f Book1.txt InventorySheet2finaloutput.txt (tried grep all forms of flag) grep -v -F -x -f Book1.txt InventorySheet2finaloutput.txt (尝试grep所有形式的标志)
  2. awk 'NR == FNR {Book1[$0]++; next} ($0 in Book1)' Book1.txt InventorySheet2finaloutput.txt
  3. awk 'NR==FNR{a[$1];next}$1 in a{print $1}' Book1.txt InventorySheet2finaloutput.txt
  4. grep "$(cat Book1.txt)" InventorySheet2finaloutput.txt

I want to find if the contents of Book1 are present in InventorySheet or not.我想查找Book1的内容是否存在于InventorySheet

Oh, I get it now: the contents of Book1 are supposed to be the prefix (with, it seems, an optional hyphen) of the lines of InventorySheet.哦,我现在明白了:Book1 的内容应该是 InventorySheet 行的前缀(似乎带有一个可选的连字符)。 So, given B1003 in Book1 we match the B1003P5 line in InventorySheet.因此,鉴于 Book1 中的B1003 ,我们匹配 InventorySheet 中的B1003P5行。 Or C1035 matches C-1035 .或者C1035匹配C-1035

grep -Ef <(sed -E 's/^/^/; s/([[:alpha:]])([[:digit:]])/\1-?\2/' Book1) InventorySheet

That uses sed to generate the extended regular expressions from the Book1 file, and the process substitution allows up to hand grep a "pseudo-filename".它使用 sed 从 Book1 文件生成扩展的正则表达式,并且过程替换允许手动 grep 一个“伪文件名”。

Given your sample files, this outputs鉴于您的示例文件,这输出

B1011P5 330-L2/23/2017 1
B8032P3 336-L3/10/2017 1
B1011P5 329-L2/14/2017 1
B8025 P4 5
B8025 P5 1
C0999 P5 4
B1003P5 325-R3/3/2017 1
B8032P4 384-R4/26/2017 1
C-1035 P3 1
B8032P3 340-R3/17/2017 1

In awk, this would be在 awk 中,这将是

awk '
    NR==FNR {book[$1]; next}
    { 
        key=$1
        gsub(/-/, "", key)
        for (b in book) 
            if (key ~ "^"b) {print; break}
    }
' Book1 InventorySheet

Best I can tell this does what you say want and the posted expected output in your question is wrong:最好我能说这符合您所说的要求,并且您问题中发布的预期输出是错误的:

$ cat tst.awk
{
    key=$1
    gsub(/[^[:alnum:]]/,"",key)
    match(key,/^[[:upper:]]+[[:digit:]]+/)
    key = substr(key,RSTART,RLENGTH)
}
NR==FNR { keys[key]; next }
key in keys

$ awk -f tst.awk Book1.txt Inventory.txt
B1011P5 330-L2/23/2017 1
B8032P3 336-L3/10/2017 1
B1011P5 329-L2/14/2017 1
B8025 P4 5
B8025 P5 1
C0999 P5 4
B1003P5 325-R3/3/2017 1
B8032P4 384-R4/26/2017 1
C-1035 P3 1
B8032P3 340-R3/17/2017 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM