简体   繁体   English

如何使用 sed、awk、Z4A037BAC753C8548F24Z、awk、Z4A037BAC753C8548F24 将跨多个文件的一行的 substring 中的空格替换为 %20

[英]How to substitute spaces with %20 in a substring of a line across multiple files using sed, awk, grep etc

In a recent update neomutt changed how it handles regexp matching and it's breaking my notmuch URI's in my config.在最近的更新中,neomutt 更改了它处理正则表达式匹配的方式,并且它破坏了我的配置中的不多的 URI。 The solution seems to be replacing the spaces in the URI with %20 .解决方案似乎是用%20替换 URI 中的空格。 This wouldn't be a huge deal except that I have a lot of virtual mailboxes defined across multiple config files.除了我在多个配置文件中定义了很多虚拟邮箱之外,这没什么大不了的。 So here is a sample of one config:所以这是一个配置的示例:

"Inbox"                 "notmuch://?query=folder:gmail/INBOX and tag:inbox" \
"Drafts"                "notmuch://?query=folder:gmail/Drafts" \
"Sent Mail"             "notmuch://?query=folder:gmail/Sent%20Mail" \
"Trash"                 "notmuch://?query=folder:gmail/Trash" \
"Today"                 "notmuch://?query=to:rsstinnett@gmail.com and date:today" \
"Yesterday"             "notmuch://?query=to:rsstinnett@gmail.com and date:yesterday" \
"This Week"             "notmuch://?query=to:rsstinnett@gmail.com and date:this_week" \
"Todo"                  "notmuch://?query=to:rsstinnett@gmail.com and tag:todo" \
"Starred"               "notmuch://?query=to:rsstinnett@gmail.com and tag:star" \
"Burning Man"           'notmuch://?query=folder:"gmail/Burning Man"' \
"  Work List"           'notmuch://?query=folder:"gmail/Burning Man/Work List"' \
"ATXHS"                 'notmuch://?query=folder:"gmail/ATX Hackerspace" and not tag:archive' \
"  ATXHS Members"       'notmuch://?query=folder:"gmail/ATX Hackerspace/Members" and not tag:archive' \
"  ATXHS Discuss"       'notmuch://?query=folder:"gmail/ATX Hackerspace/Discuss" and not tag:archive' \
"  ATXHS Announce"      'notmuch://?query=folder:"gmail/ATX Hackerspace/Announce" and not tag:archive'

Using sed , awk , grep , or whatever, how do I change "gmail/ATX Hackerspace" to "gmail/ATX%20Hackerspace" without effecting " and not tag:archive" ?使用sedawkgrep或其他任何东西,我如何将"gmail/ATX Hackerspace"更改为"gmail/ATX%20Hackerspace" " and not tag:archive"影响?

I know that other changes need to be made, but this is the only one that I'm stuck on.我知道需要进行其他更改,但这是我唯一坚持的。 Basically, I need to change the spaces between folder:" and the next instance of a double quote. I don't know if this can even be done reasonably.基本上,我需要更改folder:"和双引号的下一个实例。我不知道这是否可以合理地完成。

Using any awk in any shell on every UNIX box:在每个 UNIX 框上的任何 shell 中使用任何 awk:

$ awk 'match($0,/folder:"[^"]+"/) {
    tgt = substr($0,RSTART,RLENGTH)
    gsub(/ /,"%20",tgt)
    $0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH)
 } 1' file
"Inbox"                 "notmuch://?query=folder:gmail/INBOX and tag:inbox" \
"Drafts"                "notmuch://?query=folder:gmail/Drafts" \
"Sent Mail"             "notmuch://?query=folder:gmail/Sent%20Mail" \
"Trash"                 "notmuch://?query=folder:gmail/Trash" \
"Today"                 "notmuch://?query=to:rsstinnett@gmail.com and date:today" \
"Yesterday"             "notmuch://?query=to:rsstinnett@gmail.com and date:yesterday" \
"This Week"             "notmuch://?query=to:rsstinnett@gmail.com and date:this_week" \
"Todo"                  "notmuch://?query=to:rsstinnett@gmail.com and tag:todo" \
"Starred"               "notmuch://?query=to:rsstinnett@gmail.com and tag:star" \
"Burning Man"           'notmuch://?query=folder:"gmail/Burning%20Man"' \
"  Work List"           'notmuch://?query=folder:"gmail/Burning%20Man/Work%20List"' \
"ATXHS"                 'notmuch://?query=folder:"gmail/ATX%20Hackerspace" and not tag:archive' \
"  ATXHS Members"       'notmuch://?query=folder:"gmail/ATX%20Hackerspace/Members" and not tag:archive' \
"  ATXHS Discuss"       'notmuch://?query=folder:"gmail/ATX%20Hackerspace/Discuss" and not tag:archive' \
"  ATXHS Announce"      'notmuch://?query=folder:"gmail/ATX%20Hackerspace/Announce" and not tag:archive'

Based on I need to change the space s between folder:" and the next instance of a double quote , the following seems to be a very easy and fairly readable solution:基于我需要更改文件夹之间空格folder:"和双引号的下一个实例,以下似乎是一个非常简单且相当易读的解决方案:

sed -E ':a;s/(folder:"[^ "]*) /\1%20/;ta' yourinput

It is basically a while loop where它基本上是一个while循环,其中

  • the body s/(folder:"[^ "]*) /\1%20/ tries to pick the first, if any, space that follows folder:" before the closing " ,正文s/(folder:"[^ "]*) /\1%20/尝试选择紧跟在folder:" "的第一个空格(如果有的话),
  • the condition to repeat the loop is that the attempt was successful (ie the substitution was done indeed);重复循环的条件是尝试成功(即确实完成了替换); ta indeed t ests if any s command was successful on the current line and, if this is the case, it transfer the control to the line labelled :a . ta确实测试了当前行上是否有任何t命令成功,如果s这种情况,它将控制权转移到标记为:a的行。

Update更新

Concerning the -E option, I have tested the answer above only on GNU sed.关于-E选项,我在 GNU sed 上测试了上面的答案。 Ed Morton has tested it on OSX/BSD and the command I provided gives an unchanged output. Ed Morton 已经在 OSX/BSD 上对其进行了测试,我提供的命令给出了不变的 output。

I thought the reason could be -E , or maybe a missing ;我认为原因可能是-E ,或者可能是缺失的; after ta , but this does not seem to be the case, based on Ed Morton's attempts.ta之后,但根据 Ed Morton 的尝试,情况似乎并非如此。

I initially thought the command was POSIX-compliant, based on a the following excerpt from GNU sed's man page:我最初认为该命令是 POSIX 兼容的,基于 GNU sed 手册页的以下摘录:

 -E, -r, --regexp-extended use extended regular expressions in the script (for portability use POSIX -E).

Furhtermore on this GNU page , I read此外,在这个 GNU 页面上,我阅读了

Historically this was a GNU extension, but the -E extension has since been added to the POSIX standard ( http://austingroupbugs.net/view.php?id=528 ), so use -E for portability.从历史上看,这是一个 GNU 扩展,但 -E 扩展已被添加到 POSIX 标准中( http://austingroupbugs.net/view.php?id=528 ),所以使用 -E 来获得可移植性。

Up to this point, however, this is what GNU says of POSIX .然而,到目前为止,这就是GNU所说的POSIX

If you go to that link, the last line in the Issue history section is dated 2020-03-18 15:37 and reads Resolved => Applied , but I don't know how that sites relates to POSIX.如果您 go 到该链接,则问题历史记录部分的最后一行日期为2020-03-18 15:37 ,并显示为Resolved => Applied ,但我不知道这些站点与 POSIX 有何关系。

The bottom line is: I don't know if -E is POSIX-compliant.底线是:我不知道-E是否符合 POSIX。

Just for fun, here is another solution using only sed .只是为了好玩,这是另一个仅使用sed的解决方案。 (There is no good reason to use sed alone in production, when better tools are available; it's still a good training exercise though.) (当有更好的工具可用时,没有充分的理由在生产中单独使用sed ;尽管它仍然是一个很好的训练练习。)

Compare to the simple and short solution posted by Enrico De Angelis.与 Enrico De Angelis 发布的简单而简短的解决方案进行比较。 There are two differences between his approach and what I propose below.他的方法与我在下面提出的方法有两个不同之处。

First, the approach in Enrico's answer would not work if the "replacement" text included spaces (if, for example, each space had to be replaced with % 20 with a space after the percent sign).首先,如果“替换”文本包含空格(例如,如果必须将每个空格替换为% 20并在百分号后加一个空格),则 Enrico 的答案中的方法将不起作用。 Of course, in the OP's problem this is not the case;当然,在 OP 的问题中,情况并非如此。 but in a more general problem, the looping approach in Enrico's solution may lead to infinite loops.但在更一般的问题中,Enrico 解决方案中的循环方法可能会导致无限循环。

Second, the looping approach requires one run through the regexp matching for each space that must be replaced.其次,循环方法需要对每个必须替换的空间进行一次正则表达式匹配。 By contrast, while the solution below also runs the s command several times, it's a fixed number of runs per input line, regardless of the number of spaces to be replaced.相比之下,虽然下面的解决方案也多次运行s命令,但它是每个输入行的固定运行次数,与要替换的空格数无关。 Again, in the OP's problem this is a non-issue because there are very few spaces to replace on each line.同样,在 OP 的问题中,这不是问题,因为每行上几乎没有要替换的空格。 The approach below may be helpful in more general situations, where there are a large number of replacements needed on each line.下面的方法在更一般的情况下可能会有所帮助,其中每行都需要大量替换。

The idea is relatively simple, but the solution is complicated by the fact that sed only has two buffers we can work with.这个想法相对简单,但解决方案很复杂,因为sed只有两个我们可以使用的缓冲区。 Switching back and forth between the two, we can "save" a portion of the string we don't need to touch, and make the changes in the remaining string.在两者之间来回切换,我们可以“保存”一部分我们不需要触摸的字符串,并在剩余的字符串中进行更改。 Since we only have two buffers and three relevant substrings, we are forced to make "too many changes" in the first half of the solution, and then undo the unneeded changes in the second half.由于我们只有两个缓冲区和三个相关的子字符串,我们被迫在解决方案的前半部分进行“太多更改”,然后在后半部分撤消不需要的更改。 This solution has a glaring weakness too: if the last part of the string already had %20 in it (past the closing double-quote relevant to folder ), those will be changed to space, even though they were not spaces in the original.这个解决方案也有一个明显的弱点:如果字符串的最后一部分已经有%20 (在与folder相关的结束双引号之后),即使它们在原始部分中不是空格,它们也会被更改为空格。

I wonder if there are better approaches along these lines (meaning, specifically, not involving a looping process).我想知道这些方面是否有更好的方法(特别是不涉及循环过程)。

$ sed -E '/folder:"/{h;s/(^.*?folder:").*/\1/;x;s/^.*?folder:"//;s/ /%20/g;x;G;
> /folder:"/s/\n//;h;s/(^.*?folder:"[^"]*").*/\1/;x;s/.*?folder:"[^"]*"//;
> s/%20/ /g;x;G;/folder:"/s/\n//}' inputfile

As usual,the leading $ and > are shell prompts (not part of the sed command).像往常一样,前导$>是 shell 提示符(不是sed命令的一部分)。

EDIT As Ed Morton points out in a comment below, lazy quantifiers are a perl feature, not supported in sed .编辑正如 Ed Morton 在下面的评论中指出的那样,惰性量词是 perl 功能,在sed中不受支持。 That wasn't an essential part of my solution;这不是我解决方案的重要组成部分; here is the POSIX ERE - compliant version:这是 POSIX ERE - 兼容版本:

$ sed -E '/folder:"/{h;s/(^.*folder:").*/\1/;x;s/^.*folder:"//;s/ /%20/g;x;G;
> /folder:"/s/\n//;h;s/(^.*folder:"[^"]*").*/\1/;x;s/.*folder:"[^"]*"//;
> s/%20/ /g;x;G;/folder:"/s/\n//}' inputfile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM