简体   繁体   中英

Examine multiple locations in a line for data using awk

I am trying to extract some information that was originally sourced from a mainframe. The rows are all just ASCII character data, but each row is considered multi-segment, so has differing line lengths. Fields are length delimited. There is a field in the row that dictates how many segments of the row to expect for the variable portion. What I want to do is look for the presence of an indicator in those variable segments, and extract some data from it.

A simplified example is shown below;

UUID12345 1   ABC 1 345  
UUID23456 2   XYZ 4 763 ABC 4 678  
UUID34567 3   XYZ 4 763 ABC 2 456 QRS 2 456  
UUID45678 2   DEF 1 345 TUV 8 111 
UUID56789 0

The second column dictates how many segments to expect. There can be up to 99 segments, but in reality, there are less than 10. In the example above, each segment would contain 10 bytes starting at the position of ABC on the first line for example. What I want to extract is the first column of each line and the value that is the last 3 characters of any segment containing ABC.

So an example output could for the above row would be;

UUID12345 345  
UUID23456 678  
UUID34567 456
UUID45678 
UUID56789 

I know some very basic awk, and can look for specific sections of a line, but don't know how to achieve what I need to for this. For example, the following provides me the ability to extract the first line, but only by looking at specific locations, and doesn't take into account the 2nd column indicating the variable number of segments.

awk '{ if (substr($0, 0, 4)=="UUID" && substr($0, 15, 3)=="ABC") {print substr($0, 0, 9) " " substr ($0, 21,3)}}' <<< "UUID12345 1   ABC 1 345"

Edit

As per my comment to Ed Morton below, this is what I ended up with that works for me (where test.txt is the example shown above);

awk '{segs=substr($0, 11, 1); acc=substr($0, 1, 10); startCol=15; val=""; for(i=startCol; i<startCol+(10 * segs); i+= 10) if (substr($0, i, 3)=="ABC") val=substr($0, i + 6, 3); print acc " " segs " " val}' test.txt
$ awk '{val=""; for (i=3; i<NF; i+=3) if ($i=="ABC") val=$(i+2); print $1, val}' file
UUID12345 345
UUID23456 678
UUID34567 456
UUID45678
UUID56789

If that's not all you need then edit your question to provide more truly representative sample input/output that better captures all your requirements.

用awk的另一种方式:

awk -F'ABC' '{split($1,a," |\t");split($2,b," |\t");print a[1],b[3]}' infile

Using Perl

$ perl -lane ' ($x)=$_=~/\bABC\s+\S+\s+(\S+)/; print $F[0], " ", $x ' moose.txt
UUID12345 345
UUID23456 678
UUID34567 456
UUID45678
UUID56789

$ cat moose.txt
UUID12345 1   ABC 1 345
UUID23456 2   XYZ 4 763 ABC 4 678
UUID34567 3   XYZ 4 763 ABC 2 456 QRS 2 456
UUID45678 2   DEF 1 345 TUV 8 111
UUID56789 0

$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM