简体   繁体   中英

awk finds first match, not all matches in line

Background info:

I am trying to search for a pattern (string) in a file. I want to print the line and the position in the line, where the pattern was found.

So far, I am able to find the first appearance of the the first letter of my pattern.

But I want to find all occurences of the whole pattern

Code (search.sh):

#!/bin/bash
file=$1
awk -v s="$2" 'i=index($0, s){print "line: " NR, "pos: " i}' "$file"

Command-line call:

$ ./search.sh test.txt GA

test.txt

1 GAGAGAGAGA
2 CTCTCTCTCT
3 TATATATATA
4 CGCGCGCGCG
5 CCCCCCCCCC
6 GGGGGGGGGG
7 AAAAAAAAAA
8 TTTTTTTTTT
9 TGATTTTTTT
10 CCCCCCCCGA

when I run the above command-line call with test.txt, the result printed is:

result:

line: 1 pos: 1
line: 4 pos: 2
line: 6 pos: 1
line: 9 pos: 2
line: 10 pos: 9

which is obviously only the first match of only G.

Is there any way to slightly modify my awk command or am I thinking in a totally wrong direction?

Following awk may help you in same.

cat search.sh
Input_file="$1"
text_to_be_searched="$2"
awk -v var="$text_to_be_searched" '{
while($0){
  match($0,var);
  q=q?q+length(var):RSTART;
  if(RSTART){
    val=val? val "," q:"Line:"NR FS "pos:" q;
    $0=substr($0,RSTART+RLENGTH);
}
  else{
    if(val){
      print val};
    q=val="";
    next
}
};
  print val;
  q=val=""
}
END{
  if(val){
    print val
}}
'   "$Input_file"

./search.sh test.txt GA

Output will be as follows.

Line:1 pos:1,3,5,7,9
Line:9 pos:2
Line:10 pos:9

With Grep

test.txt

GAGAGAGAGA
CTCTCTCTCT
TATATATATA
CGCGCGCGCG
CCCCCCCCCC
GGGGGGGGGG
AAAAAAAAAA
TTTTTTTTTT
TGATTTTTTT
CCCCCCCCGA

search.sh

#!/bin/bash
while read -r line; do
    ((++i))
    echo "$line" | grep -bon "$2" | sed -r "s@^([0-9]+):([0-9]+):.*@Line: $i, Position: \2@g"
done < "$1"

Output

darby@Debian:~/Scrivania$ bash search.sh test.txt GA
Line: 1, Position: 0
Line: 1, Position: 2
Line: 1, Position: 4
Line: 1, Position: 6
Line: 1, Position: 8
Line: 9, Position: 1
Line: 10, Position: 8
darby@Debian:~/Scrivania$

NOTE

Position index start from zero.

With perl

$ perl -lne 'while(/GA/g){print "line: $. pos: $-[0]"}' ip.txt
line: 1 pos: 0
line: 1 pos: 2
line: 1 pos: 4
line: 1 pos: 6
line: 1 pos: 8
line: 9 pos: 1
line: 10 pos: 8

$ perl -lne 'while(/GA/g){print "line: $. pos: ", $-[0]+1}' ip.txt
line: 1 pos: 1
line: 1 pos: 3
line: 1 pos: 5
line: 1 pos: 7
line: 1 pos: 9
line: 9 pos: 2
line: 10 pos: 9

From perldoc

$-[0] is the offset of the start of the last successful match

$. Current line number for the last filehandle accessed.

while(/GA/g) to iterate over all matches


To pass variable

$ s='GAT' perl -lne 'while(/$ENV{s}/g){print "line: $. pos: $-[0]"}' ip.txt
line: 9 pos: 1


See also: How can I find the location of a regex match in Perl?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM