简体   繁体   中英

Bash sed - find hashtags in string

Based on this post , I have tried to come up with a command to find all hashtags words (words starting by #) in a quite complicated string:

echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" | sed -e 's/^/ /g' -e 's/ [^#][^ ]*//g' -e 's/^ *//g'

Unfortunately the output is:

#cerveau #Mopses"

instead of:

#cerveau #Einstein #Ordre #Mopses

What should be the correct command?

grep is usually better at extracting substrings. With the GNU-grep's -o option (only output the matching parts), you can just

echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" \
| grep -o '#[[:alpha:]]*'

If you really need sed , do the similar thing: replace all words that don't start with a # by a space, then remove the first word and compact the spaces:

sed -e 's/[^[:alpha:]#][[:alpha:]]*/ /g' \
    -e 's/^[^#]*//' \
    -e 's/  */ /g'

If you want to use sed , you can separate out all words that start by a \\n and then find them:

echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" \
| sed -re 's/(#\w+)/\n\1\n/g' \
| sed -rn '/^(#\w+)$/p'

You need the -r option in sed to use extended regular expressions.

You can do this:

echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" | grep -o '#[a-zA-Z0-9_]\+'

You get the expected output:

#cerveau
#Einstein
#Ordre
#Mopses

Explanation: The -o option in grep:

Prints only the matching part of the lines.

So, the grep command above matches a hashtag followed by a non-zero number of alphabets, digits and underscores.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM