简体   繁体   中英

awk pattern matching using variable with xml value

so here's my awk script. It's in a file called mAwk.awk

#!usr/bin/awk -f
  BEGIN {
    FS="."
     artifactPattern="/<artifactId>artifactName1|artifactName2<\\/artifactId>/"
 #   print "-------------" artifactPattern
  }
  {
    toPrint = 1
    if ($0 ~ /<dependencies>/) {
      matches=1000;
    }
    else if ($0 ~ /<dependency>/) {
      matches +=100;
    }
    else if ($0 ~ /<\/dependency>/) {
      matches =1000;
    }
  else if ($0 ~ /<groupId>(com.group1.*)|(com.group2.*)|(com.group3.*)<\/groupId>/) {
      matches += 10;
    }
# else if($0 ~ /<artifactId>artifactName1|artifactName2<\/artifactId>/){
 else if($0~artifactPattern){
        matches += 1;
        }
  else if ($0 ~ /<version>[0-9]+\.[0-9]+\.[0-9]+<\/version>/) {
     print "debugging: matched 1 -", matches
      if (matches == 1111) {
        lastPart="0-SNAPSHOT</version>"
        print $1 "." $2+1 "." lastPart;
        matches -= 11;
        toPrint = 0
      }
    }
    else if ($0 ~ /<\/dependencies>/) {
      matches=0
    }
    if ( toPrint == 1) {
      print $0
    }
  }
  END {
  }

Now here's the structure of the xml file (it's a pom.xml), just in-case:

<project>
  <random tags/>

  <dependencies>
    <dependency>
      <groupId>data</groupId>
      <artifactId>data</artifactId>
      <version>1.2.3</version>
    </dependency>
      ... repeat...
  </dependencies>
</project

The problem is, if I use the line:

# else if($0 ~ /<artifactId>payment-common|test2-common<\/artifactId>/){

instead of the one just below it, it matches just fine, but when I put the value inside a variable, it fails. What's going on here?

My final aim is to call this through a shell script like...

awk -v pattern=`cat ./artifactPatterns.txt` mAwk.awk -f myFile.xml

and the artifactPatterns.txt will look like waht the variable holds in the awk file, example:

/<artifactId>artifactName1|artifactName2<\\/artifactId>/

I've tried a bunch of things and nothing seems to work, thank you for your time!

Take out the // delimiters around the value of artifactPattern . These are the syntax for regexp literals, they don't belong in strings. The use of the ~ operator implies that it's a regular expression.

And since / isn't a delimiter, you don't need to escape it inside the value.

artifactPattern="<artifactId>artifactName1|artifactName2</artifactId>"

Also, $0 ~ /pattern/ can be simplified to just /pattern/ -- when a regexp literal appears by itself, it defaults to matching against the whole line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM