简体   繁体   中英

Bash - compare two files and extract the line numbers where lines are identical

I have two files:

File1:

water
doggy
moors

File2:

water1234
forge4432
moors5432

I need to compare the first 5 characters in each line in File2 with the characters in each line in File1 , in order to find out which lines in File1 are identical to the same lines in File2

So, to illustrate, water and moors are shared by both File1 and File2 , so my expected output should be:

1
3

Meaning line 1 and line 3 are shared by both files.

This is my attempt using awk , but it does not work:

awk '/substr($1,1,5)/{ print NR; exit }' File2 File1

My logic was to extract the first 5 characters of ever line in File2 and then print the line where it exists in File1 .

For this question, it is assumed that every line in File1 is 5 characters long.

Your approachi of using substr in awk seems the way for this. Note, though, thath you have to "play" with FNR and NR and also store the values for a further comparison:

$ awk 'FNR==NR{a[NR]=substr($0,0,5); next} a[FNR]==$1 {print FNR}' f2 f1
1
3

Explanation

This reads the file2 and then file1. When reading the first, it stores the 5 first characters into the array a[] using the line number as index. Then, it keeps comparing these values with the second file and printing the line when it matches.

  • FNR==NR {} when reading the first file, do {} .
  • In this case, {a[NR]=substr($0,0,5); next} {a[NR]=substr($0,0,5); next} : get the 5 first characters and store in the a[] array. Then, move to the next line.
  • a[FNR]==$1 {print FNR} when reading the second file, compare the value of the line with what was stored in the array a[] for this line number. If it matches, print the line number.
sort <(cat -n <(cut -b 1-5 file1)) <(cat -n <(cut -b 1-5 file2)) | uniq -d | cut -b 1-6

Output:

1
     3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM