简体   繁体   中英

unix - awk unexpected behaviour

I have the below code in a bash file called 'findError.sh':

#!/bin/bash
filename="$1"
formatindicator="\"|\""
echo "$formatindicator"
formatarg="\$1"
echo "$formatarg"
count=`awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `
command="awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l"
echo $command
echo $count

I then run it at the command line like so: sh findError.sh test.dat

But It gives me a different count than running the command that was echoed?? How is this possible?

ie The $command that is echoed back is:

awk -F"|" '{print $1}' test.dat | perl -ane '{ if(m/ERROR/) { print } }' | wc -l

But the $count that is echoed back is:

3

However if I just run this one line below at the command line (not through the script) - the result is 0:

awk -F"|" '{print $1}' test.dat | perl -ane '{ if(m/ERROR/) { print } }' | wc -l

Sample input file (test.dat):

sid|storeNo|latitude|longitude
2|1|-28.03720000
9|2
10
jgn352|1|-28.03ERROR720000
9|2|fdERRORkjhn422-405
0000543210|gfERRORdjk39

Notes: Using SunOS with bash version 4.0.17

You're being too careful with your quotes around the format delimiter.

When you type:

awk -F"|" ...

The program ( awk ) sees -F| as its first argument; the shell strips the double quotes.

When you have:

formatindicator="\"|\""
echo "$formatindicator"
formatarg="\$1"
echo "$formatarg"
count=`awk -F$formatindicator ...`

You have preserved the double quotes in $formatindicator and therefore awk sees -F"|" as the delimiter, and uses the double quote as the delimiter.

Use:

formatindicator="|"
echo "$formatindicator"
formatarg="\$1"
echo "$formatarg"
count=`awk -F"$formatindicator" ...`

The difference is that the shell strips the quotes off -F"$formatindicator" but doesn't do that when $formatindicator itself contains the double quotes.

(NB: edited to retain back-quotes instead of the $(...) notation that is (a) preferred and (b) was used in the first version of this answer. The $(...) notation was not recognized by the SunOS /bin/sh which was, I believe, being used to execute the script. Both bash and ksh recognize the $(...) notation, but the basic Bourne shell, /bin/sh , on Solaris 10 (SunOS 5.10) and earlier (I've not laid hands on Solaris 11) does not recognize $(...) .)

I note that any of perl , awk or grep could be used to find the count of the error lines on its own, so the triplet of awk piped to perl piped to wc is not very efficient.

awk -F"|" '$1 ~ /ERROR/ { count++ } END { print count }' $filename

grep -c ERROR $filename                # simple
grep -c '^[^|]*ERROR[^|]*|' $filename  # accurate

perl -anF"|" -e '$count++ if $F[0] =~ m/ERROR/; END { print "$count\n"; }' $filename

It's Perl, so TMTOWTDI ; take your pick...


Side discussion

In the comments, we have concerns over how various parts of the script are being interpreted.

formatindicator="|"
formatarg="\$1"

count=`awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `

Let's simplify this to (using part of my main answer):

count=`awk -F"$formatindicator" '{print $formatarg}' $filename`

The intention is to have the delimiter specified on the command line (which happens successfully) via the -F option. The issue, I expect, is "why does $formatarg get expanded inside single quotes?". And the answer is "Does it?". I think not. So, what is happening, is that awk is seeing the script {print $formatarg} . Since formatarg is not assigned any value, it is equivalent to 0, so the script prints $0 , which is the entire input line. Perl is quite happy to echo the line if it matches ERROR anywhere on the line, and wc couldn't care less about what's in the lines, so the result is approximately what was expected. The only time there'd be a discrepancy is when the line from $filename contains ERROR in other than the first pipe-delimited field. That would be counted by the script where it should not.

The problem is with using external variables in awk . If you wish to use external variables in awk then define a variable in the awk one-liner using -v option and variable name and assign your external variable to it. So

The line -

count=`awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `

should be -

count=`awk -v fi="$formatindicator" -v fa="$formatarg" 'BEGIN {FS=fi}{print fa}' "$1" | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `

Update:

As stated in the comments, the $formatarg contains the value $1 . What you need to do is store just 1 and then pass it as -

count=`awk -v fi=$formatindicator -v fa="$formatarg" 'BEGIN {FS=fi}{print $fa}' "$1" | perl -ane '{ if(m/ERROR/) { print } }' | wc -l

[jaypal:~/Temp] echo $formatindicator
|
[jaypal:~/Temp] echo $formatarg
1
[jaypal:~/Temp] awk -v fi="$formatindicator" -v fa="$formatarg" 'BEGIN {FS=fi}{print $fa}' data.file
sid
2
9
10
jgn352
9
0000543210

Script:

#!/bin/bash
filename="$1"
formatindicator="|"
echo "$formatindicator"
formatarg="1"
echo "$formatarg"
count=`awk -v fa="$formatarg" -v fi="$formatindicator" 'BEGIN{FS=fi}{print $fa}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l `
command="awk -F$formatindicator '{print $formatarg}' $filename | perl -ane '{ if(m/ERROR/) { print } }' | wc -l"
echo $command
echo $count

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM