简体   繁体   中英

Bash – conditional regex find/replace combined with converting customized %h%m%s format to seconds, single pass

My input file will have the following format:

input.txt :

News A 1 B 2h 0m 1s C text1
100 A 2 B 120m 1s C text2
Show A 3 B 450s C text3
Tom A 4 B 0:30 C text4
Laura A 5 B 20 C text5
Something A 6 B 1h 100m 70s C text6
50 A 7 B 1h 10s C text7

(You see the weird time format at the 6th line, but that was intentional, just for demo, to simplify the logic without additional 0-59 requirement).

I want to apply the following regex to each line:

^(.*?)A(.*?)B(.*?)C(.*?)$  

Note the syntax for ${BASH_REMATCH[3]} . Valid variants:

  • \\d{1,}h \\d{1,}m \\d{1,}s
  • \\d{1,}m \\d{1,}s
  • \\d{1,}s
  • \\d{1,} is equal to \\d{1,}s

I need to convert this to seconds, but if this part fails to pass this validation, leave it as is. In any case, let's name the result $sec .

I'll need to define the following regex variables:

$price == '(\\d{1,}) ', $names == '(Bob|Tom|Laura|Sandra) ', $tags == '(News|Show) ' (or (?:regex) syntax, I don't know which is better here)

Then, replace the line with the following:

  • if ${BASH_REMATCH[1]} =~ $price :

    ID: ${BASH_REMATCH[2]}; time: $sec seconds; description: ${BASH_REMATCH[4]} – buy for " + "$" + ${BASH_REMATCH[1]}! ID: ${BASH_REMATCH[2]}; time: $sec seconds; description: ${BASH_REMATCH[4]} – buy for " + "$" + ${BASH_REMATCH[1]}! (I used + here to separate dollar signs)

  • if ${BASH_REMATCH[1]} =~ $names :

    description: ${BASH_REMATCH[4]} from @${BASH_REMATCH[1]}; time: $sec seconds

  • if ${BASH_REMATCH[1]} =~ $tags :

    ID: ${BASH_REMATCH[2]}; #${BASH_REMATCH[1]}; time: $sec seconds; description: ${BASH_REMATCH[4]}

  • if ${BASH_REMATCH[1]} doesn't match any predefined regex variable (or matches more than one variable):

    ID: ${BASH_REMATCH[2]}; time: $sec seconds; ${BASH_REMATCH[1]}; description: ${BASH_REMATCH[4]}

so the output file should be

output.txt :

ID: 1; #News; time: 7201 seconds; description: text1
ID: 2; time: 7201 seconds; description: text2 – buy for $100!
ID: 3; #Show; time: 450 seconds; description: text3
description: text4 from @Tom; time: 0:30
description: text5 from @Laura; time: 20 seconds
ID: 6; time: 9670 seconds; Something; description: text6
ID: 7; time: 1h 10s; description: text7 – buy for $50!

I want to use pure Bash if possible. By the way, I like the syntax used in these answers: a/5659672/1736903 and a/21507572 , but I don't know how to apply it to my situation.

Your example output shows a mix of seconds, hours, and other time formats. Assuming you really do want time in seconds, always, try this Awk attempt.

awk '{
    # Collect fields split on A/B/C
    j=1; h=m=s=0; delete x;
    for (i=1; i<=NF; ++i) { 
        if ($i ~ /^(A|B|C)$/) ++j;
        else x[j] = x[j] (x[j] ? FS : "") $i;
    }
    # Parse collected fields
    n = split(x[3], t)
    for (i=1; i<=n; ++i)
        if (t[i] ~ /^[0-9]+h$/)
            h += t[i]
        else if (t[i] ~ /^[0-9]+m$/)
            m += t[i]
        else if (t[i] ~ /^[0-9]+s?$/)
            s += t[i]
        # else error?
    s += (m*60) + (h*3600)

    if (x[1] ~ /^[1-9][0-9]+$/)
        print "ID: " x[2] "; time: " s " seconds; description: " x[4] \
            " - buy for $" x[1] "!"
    else if (x[1] ~ /^(Bob|Tom|Laura|Sandra)$/)
        print "description: " x[4] " from @" x[1]
    else if (x[1] ~ /^(News|Show)$/)
        print "ID: " x[2] "; #" x[1] "; time: " s " seconds; description: " \
            x[4]
    else
        print "ID: " x[2] "; time: " s " seconds; " x[1] "; description: " x[4]
    }' input.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM