简体   繁体   English

Bash脚本用于比较具有不同长度字符串的2个文件

[英]Bash script to compare 2 files with different length strings

I have two files I am trying to compare the strings in each line by line. 我有两个文件,我试图逐行比较每个字符串。 File1 only contains a 6 character string prefix while File2 contains a 12 character string. File1仅包含6个字符的字符串前缀,而File2包含12个字符的字符串。 How can I loop through the File2 to find strings that start with the 6 characters from File1 and output those to a file? 如何遍历File2以查找以File1中的6个字符开头的字符串并将其输出到文件?

File1 文件1

002379
005964

File2 文件2

002379ED6212
003354EB4591
004679BB2185
005964AB3379
005964DB5496

awk或许能够实现这一目标

awk 'NR == FNR {a[$0]; next};substr($0, 1, 6) in a' File1 File2

This awk one-liner does what you want: 这个awk单行做你想要的:

awk 'NR==FNR{a[$0];next}{for(i in a)if(substr($0,1,6)==i)print}' file1 file2

NR==FNR is only true for the first file. NR==FNR仅适用于第一个文件。 Each line of file1 is stored as a key in the array a . file1每一行都作为键存储在数组a next skips the other block. next跳过另一个块。 For each record in the second file, loop through each of the keys in a and compare the first 6 characters. 用于在所述第二文件中的每个记录,遍历每个键的a和前6个字符进行比较。 If they are the same, print the record. 如果它们相同,则打印记录。

Output: 输出:

002379ED6212
005964AB3379
005964DB5496
grep -f <(sed 's/^/^/' file1) file2

It would be nice to just use grep -f to find all the lines in file2 that match a regex in file1, but you want to anchor the regexes in file1 to the beginning of the line. 使用grep -f来查找file2中与file1中的正则表达式匹配的所有行会很好,但是您希望将file1中的正则表达式锚定到行的开头。 So use the above to preprocess the strings by adding an anchor. 因此,使用上面的方法通过添加锚来预处理字符串。

For a pure-Bash solution . 对于纯Bash解决方案。 . . assuming you're using Bash v4.x, you can first populate an associative array whose keys are the lines of File1 : 假设您正在使用Bash v4.x,您可以首先填充其键是File1行的关联数组

declare -A prefixes
while read prefix ; do
    prefixes[$prefix]=1
done < File1

# Now ${prefixes[002379]} is 1, and ${prefixes[005964]} is 1, but
# ${prefixes[anything-else]} is undefined.

And then check the first six characters of each line of File2 to see if it's in this associative array: 然后检查File2每行的前六个字符,看看它是否在这个关联数组中:

while read word do ;
    prefix="${word:0:6}"
    if [[ "${prefixes[$prefix]}" ]] ; then
       echo "$word"
    fi
done < File2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM