简体   繁体   中英

awk: select first column and value in column after matching word

I have a.csv where each row corresponds to a person (first column) and attributes with values that are available for that person. I want to extract the names and values a particular attribute for persons where the attribute is available. The doc is structured as follows:

name,attribute1,value1,attribute2,value2,attribute3,value3
joe,height,5.2,weight,178,hair,
james,,,,,,
jesse,weight,165,height,5.3,hair,brown
jerome,hair,black,breakfast,donuts,height,6.8

I want a file that looks like this:

name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8

Using this earlier post , I've tried a few different awk methods but am still having trouble getting both the first column and then whatever column has the desired value for the attribute (say height). For example the following returns everything.

awk -F "height," '{print $1 "," FS$2}' file.csv

I could grep only the rows with height in them, but I'd prefer to do everything in a single line if I can.

You may use this awk :

cat attrib.awk

BEGIN {
   FS=OFS=","
   print "name,attribute,value"
}
NR > 1 && match($0, k "[^,]+") {
   print $1, substr($0, RSTART+1, RLENGTH-1)
}

# then run it as
awk -v k=',height,' -f attrib.awk file

name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8

# or this one
awk -v k=',weight,' -f attrib.awk file

name,attribute,value
joe,weight,178
jesse,weight,165

With your shown samples please try following awk code. Written and tested in GNU awk . Simple explanation would be, using GNU awk and setting RS (record separator) to ^[^,]*,height,[^,]* and then printing RT as per requirement to get expected output.

awk -v RS='^[^,]*,height,[^,]*' 'RT{print RT}' Input_file

One awk idea:

awk -v attr="height" '
BEGIN  { FS=OFS="," }
FNR==1 { print "name", "attribute", "value"; next }
       { for (i=2;i<=NF;i+=2)                         # loop through even-numbered fields
             if ($i == attr) {                        # if field value is an exact match to the "attr" variable then ...
                print $1,$i,$(i+1)                    # print current name, current field and next field to stdout
                next                                  # no need to check rest of current line; skip to next input line
             }
       }
' file.csv

NOTE: this assumes the input value ( height in this example) will match exactly (including same capitalization) with a field in the file

This generates:

name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8

I'd suggest a sed one-liner:

sed -n 's/^\([^,]*\).*\(,height,[^,]*\).*/\1\2/p' file.csv

With a one-liner:

$ perl -lne '
    print "name,attribute,value" if $.==1;
    print "$1,$2" if /^(\w+).*(height,\d+\.\d+)/
' file

output

name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8

awk accepts variable-value arguments following a -v flag before the script. Thus, the name of the required attribute can be passed into an awk script using the general pattern:

awk -v attr=attribute1 ' {} ' file.csv

Inside the script, the value of the passed variable is reference by the variable name, in this case attr .

Your criteria are to print column 1, the first column containing the name, the column corresponding to the required header value, and the column immediately after that column (holding the matched values).

Thus, the following script allows you to fish out the column headed "attribute1" and it's next neighbour:

awk -v attr=attribute1 ' BEGIN {FS=","} /attr/{for (i=1;i<=NF;i++) if($i == attr) col=i;} {print $1","$col","$(col+1)} ' data.txt

result:

name,attribute1,value1
joe,height,5.2
james,,
jesse,weight,165
jerome,hair,black

another column (attribute 3):

awk -v attr=attribute3 ' BEGIN {FS=","} /attr/{for (i=1;i<=NF;i++) if($i == attr) col=i;} {print $1","$col","$(col+1)} ' awkNames.txt

result:

name,attribute3,value3
joe,hair,
james,,
jesse,hair,brown
jerome,height,6.8

Just change the value of the -v attr= argument for the required column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM