简体   繁体   中英

awk Print Line Issue

I'm experiencing some issues with a awk command right now. The original script was developed using awk on MacOS and was then ported to Linux. There awk shows a different behavior.

What I want to do is to count the occurrences of single strings provided via /tmp/test.uniq.txt in the file /tmp/test.txt .

awk '{print $1, system("cat /tmp/test.txt | grep -o -c " $1)}' /tmp/test.uniq.txt

Mac delivers an expected output like:

  test1 2 
  test2 1

The output is in one line, the sting and the number of occurrences, separated by a whitespace.

Linux delivers an output like:

2
test1 1
test2 

The output is not in one line an the output of the system command is printed first.

Sample input: test.txt looks like:

test1 test test 
test1 test test
test2 test test

test.uniq.txt looks like:

test1
test2

As comments suggested that using grep and cat etc using system function is not recommended as awk is complete language that can perform most of these tasks.

You can use following awk command to replace your cat | grep cat | grep functionality:

awk 'FNR == NR {a[$1]=0; next} {for (i=1; i<=NF; i++) if ($i in a) a[$i]++} 
END { for (i in a) print i, a[i] }' uniq.txt test.txt

test1 2
test2 1

Note that this output doesn't match with the count 5 as your question states as your sample data is probably different.


References:

It looks to me as if you're trying to count the number of line containing each unique string in the uniq file. But the way you're doing it is .. awkward, and as you've demonstrated, inconsistent between versions of awk.

The following might work a little better:

$ awk '
  NR==FNR {
    a[$1]
    next
  }
  {
    for (i in a) {
      if ($1~i) {
        a[i]++
      }
    }
  }
  END {
    for (i in a)
      printf "%6d\t%s\n",a[i],i
  }
' test.uniq.txt test.txt
         2  test1
         1  test2

This loads your uniq file into an array, then for every line in your text file, steps through the array to count the matches.

Note that these are being compared as regular expressions, without word boundaries, so test1 will also be counted as part of test12 .

Another way might be to use grep + sort + uniq :

grep -o -w -F -f uniq.txt test.txt | sort | uniq -c

It's a pipeline but a short one

From man grep :

  • -F , --fixed-strings , --fixed-regexp Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. ( -F is specified by POSIX, --fixed-regexp is an obsoleted alias, please do not use it in new scripts.)
  • -f FILE , --file=FILE Obtain patterns from FILE, one per line. The empty file contains zero patterns and therefore matches nothing. ( -f is specified by POSIX.)
  • -o , --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
  • -w , --word-regexp Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM