简体   繁体   中英

Regular Expression in Linux command sed

I have a shell variable:

all_apk_file="a 1 2.apk x.apk y m.apk"

I want to replace the a 1 2.apk with TEST , using the command:

echo $all_apk_file | sed 's/(.*apk ){1}/TEST/g'

The .*apk means end with apk , {1} means only match one time, but it doesn't work; I only got the original variable as output: a 1 2.apk x.apk y m.apk

Can anyone tell me why?

First , to enable the regular expressions you're familiar with in sed , you need to use the -r switch (sed -r ...):

echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/g'
# returns TESTy m.apk

Look at what it returns: TESTy m.apk . This is because the .* is greedy , so it matches as much as it possibly can . That is, the .* matches a 1 2.apk x , and you've said you want to replace .*apk , being a 1 2.apk x.apk with 'TEST', resulting in TESTy m.apk (note the following space after the '.apk' in your regular expression, which is why the match doesn't extend all the way to the last '.apk', which has no space following it).

Usually one could change the .* to .*? to make it non-greedy , but this behaviour is not supported in sed.

So, to fix it you just have to make your regex more restrictive.

It is hard to tell what you want to do - remove the first three words where the third ends in '.apk' and replace with 'TEST'? In that case, one could use the regular expression:

[a-z0-9]+ +[a-z0-9]+ +[a-z0-9]+\.apk

in combination with the 'i' switch (case insensitive).

You will have to give your logic for deciding what to remove (first three words, any number of words up to the first '.apk' word, etc) in order for us to help you further with the regex.

Secondly , you've put the 'g' switch in your regex. This means that all matching patterns will be replaced, and you seem to only want the first to be replaced. So remove the 'g' switch.

Finally , all of thse in combination:

echo $all_apk_file | sed -r 's/[a-z0-9]+ +[a-z0-9]+ +[a-z0-9]+\.apk/TEST/i'
# TEST x.apk y m.apk

This might work for you:

echo "$all_apk_file" | sed 's/apk/\n/;s/.*\n/TEST/'
TEST x.apk y m.apk

As to why your regexp did not work see @mathematical.coffee and @Jonathan Leffler's excellent explanations.

s/apk/\\n/ is synonymous with s/apk/\\n/1 which means replace the first occurence of apk with \\n . As sed uses the \\n as a record separator we know that it cannot occur in any initial strings passed to the sed commands. With these two facts under our belts we can split strings.

NB If you wanted to replace upto the second apk then s/apk/\\n/2 would fit the bill. Of course for the last occurence of apk then .*apk comes into play.

One part of the problem is that in regular sed , the () and {} are ordinary characters in patterns until escaped with backslashes. Since there are no parentheses in the variable's value, the regex never matches. With GNU sed , you can also enable extended regular expressions with the -r flag. If you fix that problem, you will then run into the problem that .* is greedy, and the g modifier actually doesn't change anything:

$ echo $all_apk_file | sed 's/\(.*apk \)\{1\}/TEST/g'
TESTy m.apk
$ echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/g'
TESTy m.apk
$ echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/'
TESTy m.apk
$

It only stops there because there isn't a space after m.apk in the echoed value of the variable.

The issue now is: what is it that you want replaced? It sounds like 'everything up to and including the first occurrence of apk at the end of a word. This is probably most easily done with trailing context or non-greedy matching as found in Perl regular expressions. If switching to Perl is an option, do so. If not, it is not trivial in normal sed regular expressions.

$ echo $all_apk_file | sed 's/^[^.]* [^.][^.]*\.apk /TEST /'
TEST x.apk y m.apk
$

This looks for anything without dots in it, followed by a blank, followed by no dots again, and .apk ; this means that the first dot allowed is the one in 2.apk . It works for the sample data; it would not work if the variable contained:

all_apk_file="a 1.2 2.apk m.apk y.apk 37"

You'll need to tune this to meet your requirements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM