简体   繁体   中英

Adding numbers in text file using awk and Bash

I need to take all numbers that appear within a book index and add 22 to them. The index data looks like this (for example):

Ubuntu, 120, 143, 154
Yggdrasil, 144, 170-171
Yood, Charles, 6
Young, Bob, 178-179
Zawinski, Jamie, 204

I am trying to do this with awk using this script:

#!/bin/bash

filename="index"
while read -r line
do
    echo $line | awk -v n=22 '{printf($1)}{printf(" " )}{for(i=2;i<=NF;i++)printf(i%2?$i+n:$i+n)", "};{print FS}'
done < "$filename"

It comes close to working but has the following problems:

  1. It doesn't work for page numbers that are part of a range (eg, "170-171") rather than individual numbers.
  2. For entries where the index term is more than one word (eg, "X Windows" and "Young, Bob") the output displays only the first word in the term. The second word ends up being output as the number 22. (I know why this is happening -- my awk commands treats $2 as a number, and if it's a string it assumes it has a value of 0) but I can't figure out how to solve it.

Disclosure: I'm by no means an awk expert. I'm just looking for a quick way to modify the page numbers in my index (which is due in a few days) because my publisher decided to change the pagination in the manuscript after I had already prepared the index. awk seems like the best tool for the job to me, but I'm open to other suggestions if someone has a better idea. Basically, I just need a way to say "take all numbers in this file and add 22 to them; don't change anything else."

For example:

perl -plE 's/\b(\d+)\b/$1+22/ge' index

output

Ubuntu, 142, 165, 176
Yggdrasil, 166, 192-193
Yood, Charles, 28
Young, Bob, 200-201
Zawinski, Jamie, 226

but it isn't awk

With GNU awk for multi-char RS and RT:

$ awk -v RS='[0-9]+' '{ORS=(RT=="" ? "" : RT+22)}1' file
Ubuntu, 142, 165, 176
Yggdrasil, 166, 192-193
Yood, Charles, 28
Young, Bob, 200-201
Zawinski, Jamie, 226

You can use this gnu awk command:

awk 'BEGIN {FS="\f";RS="(, |-|\n)";} /^[0-9]+$/ {$1 = $1 +22} { printf("%s%s", $1, RT);}' yourfile
  • there is a bit of abuse with FS and RS to get awk to handle each token in each line as a record of its own , so you dont have to loop over the fields and test each field whether or not it is numerical
  • RS="(, |-|\\n)" configures dash, newline and ", " as record separators
  • on "records" consisting only of digits: 22 is added
  • the printf prints the token together with its RT to reconstruct the line from the file

Consider using the following awk script( add_number.awk ):

BEGIN{ FS=OFS=", "; if (!n) n=22; }  # if `n` variable hasn't been passed the default is 22
{
    for (i=1;i<=NF;i++) {      # traversing fields
        if ($i~/^[0-9]+$/) {   # if a field contains a single number
            $i+=n;
        }  
        else if (match($i, /^([0-9]+)-([0-9]+)$/, arr)) {  # if a field contains `range of numbers`
            $i=arr[1]+n"-"arr[2]+n;
        } 
    }
    print;
}

Usage:

awk -v n=22 -f add_number.awk testfile

The output:

Ubuntu, 142, 165, 176
Yggdrasil, 166, 192-193
Yood, Charles, 28
Young, Bob, 200-201
Zawinski, Jamie, 226

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM