简体   繁体   English

Linux命令sed中的正则表达式

[英]Regular Expression in Linux command sed

I have a shell variable: 我有一个外壳变量:

all_apk_file="a 1 2.apk x.apk y m.apk"

I want to replace the a 1 2.apk with TEST , using the command: 我想用命令TEST替换a 1 2.apk

echo $all_apk_file | sed 's/(.*apk ){1}/TEST/g'

The .*apk means end with apk , {1} means only match one time, but it doesn't work; .*apk表示以apk结尾, {1}表示仅匹配一次,但不起作用; I only got the original variable as output: a 1 2.apk x.apk y m.apk 我只有原始变量作为输出: a 1 2.apk x.apk y m.apk

Can anyone tell me why? 谁能告诉我为什么?

First , to enable the regular expressions you're familiar with in sed , you need to use the -r switch (sed -r ...): 首先 ,要启用您在sed熟悉的正则表达式,您需要使用-r开关(sed -r ...):

echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/g'
# returns TESTy m.apk

Look at what it returns: TESTy m.apk . 看看返回的内容: TESTy m.apk This is because the .* is greedy , so it matches as much as it possibly can . 这是因为.*贪婪 ,因此它尽可能匹配。 That is, the .* matches a 1 2.apk x , and you've said you want to replace .*apk , being a 1 2.apk x.apk with 'TEST', resulting in TESTy m.apk (note the following space after the '.apk' in your regular expression, which is why the match doesn't extend all the way to the last '.apk', which has no space following it). 也就是说, .*匹配a 1 2.apk x ,并且您已经说过要替换.*apka 1 2.apk x.apk .*apk a 1 2.apk x.apk为'TEST',从而得到TESTy m.apk (请注意正则表达式中'.apk'之后的空格,这就是为什么匹配不会一直扩展到最后一个'.apk'的原因,后者后面没有空格。

Usually one could change the .* to .*? 通常可以将.*更改为.*? to make it non-greedy , but this behaviour is not supported in sed. 使其不贪心 ,但sed不支持此行为。

So, to fix it you just have to make your regex more restrictive. 因此,要解决此问题,只需使您的正则表达式更具限制性。

It is hard to tell what you want to do - remove the first three words where the third ends in '.apk' and replace with 'TEST'? 很难说出您要做什么-删除前三个单词的第三个末尾以“ .apk”结尾并替换为“ TEST”吗? In that case, one could use the regular expression: 在这种情况下,可以使用正则表达式:

[a-z0-9]+ +[a-z0-9]+ +[a-z0-9]+\.apk

in combination with the 'i' switch (case insensitive). 与“ i”开关(不区分大小写)结合使用。

You will have to give your logic for deciding what to remove (first three words, any number of words up to the first '.apk' word, etc) in order for us to help you further with the regex. 您必须给出决定删除哪些内容的逻辑(前三个单词,直至第一个“ .apk”单词的任意数量的单词,等等),以便我们进一步帮助您使用正则表达式。

Secondly , you've put the 'g' switch in your regex. 其次 ,将“ g”开关放入正则表达式中。 This means that all matching patterns will be replaced, and you seem to only want the first to be replaced. 这意味着将替换所有匹配的模式,并且您似乎只希望替换第一个。 So remove the 'g' switch. 因此,删除“ g”开关。

Finally , all of thse in combination: 最后 ,所有这些结合在一起:

echo $all_apk_file | sed -r 's/[a-z0-9]+ +[a-z0-9]+ +[a-z0-9]+\.apk/TEST/i'
# TEST x.apk y m.apk

This might work for you: 这可能对您有用:

echo "$all_apk_file" | sed 's/apk/\n/;s/.*\n/TEST/'
TEST x.apk y m.apk

As to why your regexp did not work see @mathematical.coffee and @Jonathan Leffler's excellent explanations. 至于为什么您的正则表达式不起作用,请参阅@ mathematical.coffee和@Jonathan Leffler的出色解释。

s/apk/\\n/ is synonymous with s/apk/\\n/1 which means replace the first occurence of apk with \\n . s/apk/\\n/是同义s/apk/\\n/1 ,这意味着代替第一次出现的apk\\n As sed uses the \\n as a record separator we know that it cannot occur in any initial strings passed to the sed commands. 由于sed使用\\n作为记录分隔符,我们知道它不可能在传递给sed命令的任何初始字符串中出现。 With these two facts under our belts we can split strings. 有了这两个事实,我们就可以拆分字符串。

NB If you wanted to replace upto the second apk then s/apk/\\n/2 would fit the bill. 注意:如果您想更换第二个apks/apk/\\n/2要求。 Of course for the last occurence of apk then .*apk comes into play. 当然,对于apk的最后一次出现, .*apk起作用。

One part of the problem is that in regular sed , the () and {} are ordinary characters in patterns until escaped with backslashes. 问题的一部分是,在常规sed(){}是模式中的普通字符,直到使用反斜杠转义为止。 Since there are no parentheses in the variable's value, the regex never matches. 由于变量的值中没有括号,因此正则表达式永远不会匹配。 With GNU sed , you can also enable extended regular expressions with the -r flag. 使用GNU sed ,您还可以使用-r标志启用扩展的正则表达式。 If you fix that problem, you will then run into the problem that .* is greedy, and the g modifier actually doesn't change anything: 如果您解决了该问题,那么您将遇到.*贪婪的问题,而g修饰符实际上并不会改变任何内容:

$ echo $all_apk_file | sed 's/\(.*apk \)\{1\}/TEST/g'
TESTy m.apk
$ echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/g'
TESTy m.apk
$ echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/'
TESTy m.apk
$

It only stops there because there isn't a space after m.apk in the echoed value of the variable. 它仅在此处停止,因为变量的回显值中m.apk没有空格。

The issue now is: what is it that you want replaced? 现在的问题是:您要替换的是什么? It sounds like 'everything up to and including the first occurrence of apk at the end of a word. 这听起来像“一切,直到单词末尾第一次出现的apk This is probably most easily done with trailing context or non-greedy matching as found in Perl regular expressions. 这很容易用Perl正则表达式中的尾随上下文或非贪婪匹配来完成。 If switching to Perl is an option, do so. 如果可以切换到Perl,请这样做。 If not, it is not trivial in normal sed regular expressions. 如果不是这样,那么在正常的sed正则表达式中它就并非无关紧要。

$ echo $all_apk_file | sed 's/^[^.]* [^.][^.]*\.apk /TEST /'
TEST x.apk y m.apk
$

This looks for anything without dots in it, followed by a blank, followed by no dots again, and .apk ; 这将查找没有点的任何内容,其后是空格, .apk有点,以及.apk this means that the first dot allowed is the one in 2.apk . 这意味着允许的第一个点是2.apk的一个。 It works for the sample data; 它适用于样本数据; it would not work if the variable contained: 如果该变量包含:

all_apk_file="a 1.2 2.apk m.apk y.apk 37"

You'll need to tune this to meet your requirements. 您需要对此进行调整以满足您的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM