简体   繁体   中英

Execute command on the same line multiple times with sed

I need to highlight every duplicate word in the text with * symbol.
For example

lol foo lol bar foo bar

should be

lol foo *lol* bar *foo* *bar*

I tried with the following command:

echo "lol foo lol bar foo bar" | sed -r -e 's/(\b[a-zA-Z]+\b)([^*]+)(\1)/\1\2*\3*/'

It gives me:

lol foo *lol* bar foo bar

Then I added g flag:

lol foo *lol* bar foo *bar*

But foo is not highlighted.
I know that it happens because sed doesn't look behind if the match was found .

Can I handle it with only sed ?

Sed is not the best tool for this task. It doesn't look-ahead, look-behind and non-greedy quantifiers, but give a try to the following command:

sed -r -e ':a ; s/\b([a-zA-Z]+)\b(.*) (\1)( |$)/\1\2 *\3* / ; ta'

It uses conditional branching to execute the substitution command until it fails. Also, you cannot check ([^*]+) because for second round it has to traverse some * of the first substitution, your option is a greedy .* . And last, you cannot match (\\1) only because it would match the first string lol again and again. You need some context like surrounded by spaces or end of line.

The command yields:

lol foo *lol* bar *foo* *bar*

UPDATE : An improvement provided by potong in comments:

sed -r ':a;s/\b(([[:alpha:]]+)\s.*\s)\2\b/\1*\2*/;ta' file

Using awk

awk '{for (i=1;i<=NF;i++) if (a[$i]++>=1) printf "*%s* ",$i; else printf "%s ",$i; print ""}' file
lol foo *lol* bar *foo* *bar*

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM