简体   繁体   中英

split field into array in awk, then search each term in another file

I'm trying to de-compose a field from a specific file into an array, and then check if each term appears in a second file (which has been already stored in another array). The goal is to merge information from both files.

The first file1 (the one with the field I want to split) looks like that:

data1=data2=data3 some more stuff
data4=data1 this are things
data2=data5 more text here
...

While file2 has this structure:

data1 10
data2 20
data3 35
data4 15
data5 60

I want to split the the first field of file1 using = , then search each of the splitted terms in the second file, and print everything in the following format:

output :

data1=data2=data3 some more stuff 10
data1=data2=data3 some more stuff 20
data1=data2=data3 some more stuff 35
data4=data1 this are things 15
data4=data1 this are things 10
data2=data5 more text here 20
data2=data5 more text here 60

So far, I've got this:

awk 'NR==FNR {
l[$1] = $2; next
} {
la=split($1,a,"=")
for(x=1;x<=la;x++)
  print $0,l[a[$x]]
}' file2 file1 > output

First (when NR==FNR ), I store file2 data in the array l using the first field as key.

Then I parse the next file in the following manner: for each record, I split the field $1 into an array la using = as the separator. la variable stores the number of terms in the array a .

For each element in array a ( for loop), I look for the corresponding key in array l and output the current content + l value.

But, for some reason, I only get the content from file1 (current, unwanted output):

data1=data2=data3 some more stuff 
data1=data2=data3 some more stuff 
data1=data2=data3 some more stuff 
data4=data1 this are things 
data4=data1 this are things 
data2=data5 more text here 
data2=data5 more text here 

Any ideas on what might be wrong with my code?

Thanks a lot!

awk to the rescue!

If your tokens are fixed length you can do pattern match without splitting the field

$ awk 'NR==FNR{a[$1]=$2;next}
              {for(k in a) if($1~k) print $0, a[k]}' file2 file1

data1=data2=data3 some more stuff 10
data1=data2=data3 some more stuff 20
data1=data2=data3 some more stuff 35
data4=data1 this are things 10
data4=data1 this are things 15
data2=data5 more text here 20
data2=data5 more text here 60

I found the answer myself. It was an issue with variable naming.

This is the correct code:

awk 'NR==FNR {
l[$1] = $2; next
} {
la=split($1,a,"=")
for(x=1;x<=la;x++)
  print $0,l[a[x]]
}' file2 file1 > output

The key is in the printing function. It now reads print $0,l[a[x]] instead of print $0,l[a[$x]] . The loop is using x as its internal counter, not $x . Changing that now points to the correct key in array l (from file2 ).

I'm leaving the post because it looks like this question hasn't been posed before. Please tell me if you think it's not useful.

Thanks!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM