简体   繁体   中英

AWK - string containing required fields

I thought it would be easy to define a string such as "1 2 3" and use it within AWK (GAWK) to extract the required fields, how wrong I have been.

I have tried creating AWK arrays, BASH arrays, splitting, string substitution etc, but could not find any method to use the resulting 'chunks' (ie the column/field numbers) in a print statement.

I believe Akshay Hegde has provided an excellent solution with the get_cols function, here

but it was over 8 years ago, and I am really struggling to work out 'how it works', namely, what this is doing; s = length(s)? s OFS $(C[i]): $(C[i])

I am unable to post a comment asking for clarification due to my lack of reputation (and it is an old post). Is someone able to explain how the solution works?

NB I don't think I need the sub as I using the following to cleanup (replace all non-numeric characters with a comma, ie seperator, and sort numerically) Columns=$(echo $Input_string | sed 's/[^0-9]\+/,/g') Columns=$(echo $Columns | xargs -n1 | sort -n | xargs)

(using this string, the awk would be Executed using awk -v cols=$Columns -f test.awk infile in the given solution)


Given the informative answer from @Ed Morton, with a nice worked example, I have attempted to remove the need for a function (and also an additional awk program file). The intention is to have this within a shell script, and I would rather it be self contained, but also, further investigation into 'how it works'.

Fields="1 2 3"
echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i++) s = length(s) ? s OFS $(Column[i]) : $(Column[i])}END{print "s="s " arr1="Column[1]" arr2="Column[2]" arr3="Column[3]}'

The results have surprised me (taking note of my Comment to Ed)

s=1 2 3 arr1=1 arr2=2 arr3=3

The above clearly shows the split has worked into the array, but I thought s would include $ for each ternary operator concatenation, ie "$1 $2 $3"

Moreso, I was hoping to append the actual file to the above command, which I have found allows me to use echo $string | awk '{program}' file.name

NB it is a little insulting that my question has been marked as -1 indicating little research effort, as I have spent days trying to work this out.

Taking all the information above, I think s results in "1 2 3", but the print doesn't accept this in the same way as it does as it is called from a function, simply trying to 'print 1 2 3' in relation to the file, which seems to be how all my efforts have ended up. This really confuses me, as Ed's 'diagonal' example works from command line, indicating that concept of 'print s' is absolutely fine when used with a file name input. Can anyone suggest how this (example below) can work?

I don't know if using echo pipe and appending the file name is strictly allowed, but it appears to work (???!?!)

(failed result) echo $Fields | awk -F "," '{n=split($0,Column," "); for(i=1;i<=n;i++) s = length(s)? s OFS $(Column[i]): $(Column[i])}END{print s}' myfile.txt

This appears to go through myfile.txt and output all lines containing many comma separated values, ie the whole file (I haven't included the values, just for illustration only),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

what this is doing; s = length(s)? s OFS $(C[i]): $(C[i])

You have encountered a ternary operator , it has following syntax

condition ? valueiftrue : valueiffalse

length function, when provided with single argument does return number of characters, in GNU AWK integer 0 is considered false, others integers are considered true, so in this case it is is not empty check. When s is not empty (it might be also not initalized yet, as GNU AWK will assume empty string in such case), it is concatenated with output field separator ( OFS , default is space) and C[i] -th field value and assigned to variable s , when s is empty value of C[i] -th field value. Used multiple time this allows building of string of values sheared by OFS , consider following simple example, let say you want to get diagonal of 2D matrix, stored in file.txt with following content

1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25

then you might do

awk '{s = length(s) ? s OFS $(NR) : $(NR)}END{print s}' file.txt

which will get output

1 7 13 19 25

Explanation: NR is number row, so 1st row $(NR) is 1st field, for 2nd row it is 2nd field, for 3rd it is 3rd field and so on

(tested in GNU Awk 5.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM