简体   繁体   中英

How to substitute spaces with %20 in a substring of a line across multiple files using sed, awk, grep etc

In a recent update neomutt changed how it handles regexp matching and it's breaking my notmuch URI's in my config. The solution seems to be replacing the spaces in the URI with %20 . This wouldn't be a huge deal except that I have a lot of virtual mailboxes defined across multiple config files. So here is a sample of one config:

"Inbox"                 "notmuch://?query=folder:gmail/INBOX and tag:inbox" \
"Drafts"                "notmuch://?query=folder:gmail/Drafts" \
"Sent Mail"             "notmuch://?query=folder:gmail/Sent%20Mail" \
"Trash"                 "notmuch://?query=folder:gmail/Trash" \
"Today"                 "notmuch://?query=to:rsstinnett@gmail.com and date:today" \
"Yesterday"             "notmuch://?query=to:rsstinnett@gmail.com and date:yesterday" \
"This Week"             "notmuch://?query=to:rsstinnett@gmail.com and date:this_week" \
"Todo"                  "notmuch://?query=to:rsstinnett@gmail.com and tag:todo" \
"Starred"               "notmuch://?query=to:rsstinnett@gmail.com and tag:star" \
"Burning Man"           'notmuch://?query=folder:"gmail/Burning Man"' \
"  Work List"           'notmuch://?query=folder:"gmail/Burning Man/Work List"' \
"ATXHS"                 'notmuch://?query=folder:"gmail/ATX Hackerspace" and not tag:archive' \
"  ATXHS Members"       'notmuch://?query=folder:"gmail/ATX Hackerspace/Members" and not tag:archive' \
"  ATXHS Discuss"       'notmuch://?query=folder:"gmail/ATX Hackerspace/Discuss" and not tag:archive' \
"  ATXHS Announce"      'notmuch://?query=folder:"gmail/ATX Hackerspace/Announce" and not tag:archive'

Using sed , awk , grep , or whatever, how do I change "gmail/ATX Hackerspace" to "gmail/ATX%20Hackerspace" without effecting " and not tag:archive" ?

I know that other changes need to be made, but this is the only one that I'm stuck on. Basically, I need to change the spaces between folder:" and the next instance of a double quote. I don't know if this can even be done reasonably.

Using any awk in any shell on every UNIX box:

$ awk 'match($0,/folder:"[^"]+"/) {
    tgt = substr($0,RSTART,RLENGTH)
    gsub(/ /,"%20",tgt)
    $0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH)
 } 1' file
"Inbox"                 "notmuch://?query=folder:gmail/INBOX and tag:inbox" \
"Drafts"                "notmuch://?query=folder:gmail/Drafts" \
"Sent Mail"             "notmuch://?query=folder:gmail/Sent%20Mail" \
"Trash"                 "notmuch://?query=folder:gmail/Trash" \
"Today"                 "notmuch://?query=to:rsstinnett@gmail.com and date:today" \
"Yesterday"             "notmuch://?query=to:rsstinnett@gmail.com and date:yesterday" \
"This Week"             "notmuch://?query=to:rsstinnett@gmail.com and date:this_week" \
"Todo"                  "notmuch://?query=to:rsstinnett@gmail.com and tag:todo" \
"Starred"               "notmuch://?query=to:rsstinnett@gmail.com and tag:star" \
"Burning Man"           'notmuch://?query=folder:"gmail/Burning%20Man"' \
"  Work List"           'notmuch://?query=folder:"gmail/Burning%20Man/Work%20List"' \
"ATXHS"                 'notmuch://?query=folder:"gmail/ATX%20Hackerspace" and not tag:archive' \
"  ATXHS Members"       'notmuch://?query=folder:"gmail/ATX%20Hackerspace/Members" and not tag:archive' \
"  ATXHS Discuss"       'notmuch://?query=folder:"gmail/ATX%20Hackerspace/Discuss" and not tag:archive' \
"  ATXHS Announce"      'notmuch://?query=folder:"gmail/ATX%20Hackerspace/Announce" and not tag:archive'

Based on I need to change the space s between folder:" and the next instance of a double quote , the following seems to be a very easy and fairly readable solution:

sed -E ':a;s/(folder:"[^ "]*) /\1%20/;ta' yourinput

It is basically a while loop where

  • the body s/(folder:"[^ "]*) /\1%20/ tries to pick the first, if any, space that follows folder:" before the closing " ,
  • the condition to repeat the loop is that the attempt was successful (ie the substitution was done indeed); ta indeed t ests if any s command was successful on the current line and, if this is the case, it transfer the control to the line labelled :a .

Update

Concerning the -E option, I have tested the answer above only on GNU sed. Ed Morton has tested it on OSX/BSD and the command I provided gives an unchanged output.

I thought the reason could be -E , or maybe a missing ; after ta , but this does not seem to be the case, based on Ed Morton's attempts.

I initially thought the command was POSIX-compliant, based on a the following excerpt from GNU sed's man page:

 -E, -r, --regexp-extended use extended regular expressions in the script (for portability use POSIX -E).

Furhtermore on this GNU page , I read

Historically this was a GNU extension, but the -E extension has since been added to the POSIX standard ( http://austingroupbugs.net/view.php?id=528 ), so use -E for portability.

Up to this point, however, this is what GNU says of POSIX .

If you go to that link, the last line in the Issue history section is dated 2020-03-18 15:37 and reads Resolved => Applied , but I don't know how that sites relates to POSIX.

The bottom line is: I don't know if -E is POSIX-compliant.

Just for fun, here is another solution using only sed . (There is no good reason to use sed alone in production, when better tools are available; it's still a good training exercise though.)

Compare to the simple and short solution posted by Enrico De Angelis. There are two differences between his approach and what I propose below.

First, the approach in Enrico's answer would not work if the "replacement" text included spaces (if, for example, each space had to be replaced with % 20 with a space after the percent sign). Of course, in the OP's problem this is not the case; but in a more general problem, the looping approach in Enrico's solution may lead to infinite loops.

Second, the looping approach requires one run through the regexp matching for each space that must be replaced. By contrast, while the solution below also runs the s command several times, it's a fixed number of runs per input line, regardless of the number of spaces to be replaced. Again, in the OP's problem this is a non-issue because there are very few spaces to replace on each line. The approach below may be helpful in more general situations, where there are a large number of replacements needed on each line.

The idea is relatively simple, but the solution is complicated by the fact that sed only has two buffers we can work with. Switching back and forth between the two, we can "save" a portion of the string we don't need to touch, and make the changes in the remaining string. Since we only have two buffers and three relevant substrings, we are forced to make "too many changes" in the first half of the solution, and then undo the unneeded changes in the second half. This solution has a glaring weakness too: if the last part of the string already had %20 in it (past the closing double-quote relevant to folder ), those will be changed to space, even though they were not spaces in the original.

I wonder if there are better approaches along these lines (meaning, specifically, not involving a looping process).

$ sed -E '/folder:"/{h;s/(^.*?folder:").*/\1/;x;s/^.*?folder:"//;s/ /%20/g;x;G;
> /folder:"/s/\n//;h;s/(^.*?folder:"[^"]*").*/\1/;x;s/.*?folder:"[^"]*"//;
> s/%20/ /g;x;G;/folder:"/s/\n//}' inputfile

As usual,the leading $ and > are shell prompts (not part of the sed command).

EDIT As Ed Morton points out in a comment below, lazy quantifiers are a perl feature, not supported in sed . That wasn't an essential part of my solution; here is the POSIX ERE - compliant version:

$ sed -E '/folder:"/{h;s/(^.*folder:").*/\1/;x;s/^.*folder:"//;s/ /%20/g;x;G;
> /folder:"/s/\n//;h;s/(^.*folder:"[^"]*").*/\1/;x;s/.*folder:"[^"]*"//;
> s/%20/ /g;x;G;/folder:"/s/\n//}' inputfile

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM