Bash sed - find hashtags in string

Question

Based on this post , I have tried to come up with a command to find all hashtags words (words starting by #) in a quite complicated string:

echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" | sed -e 's/^/ /g' -e 's/ [^#][^ ]*//g' -e 's/^ *//g'

Unfortunately the output is:

#cerveau #Mopses"

instead of:

#cerveau #Einstein #Ordre #Mopses

What should be the correct command?

Answer 1

grep is usually better at extracting substrings. With the GNU-grep's -o option (only output the matching parts), you can just

echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" \
| grep -o '#[[:alpha:]]*'

If you really need sed , do the similar thing: replace all words that don't start with a # by a space, then remove the first word and compact the spaces:

sed -e 's/[^[:alpha:]#][[:alpha:]]*/ /g' \
    -e 's/^[^#]*//' \
    -e 's/  */ /g'

Answer 2

If you want to use sed , you can separate out all words that start by a \\n and then find them:

echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" \
| sed -re 's/(#\w+)/\n\1\n/g' \
| sed -rn '/^(#\w+)$/p'

You need the -r option in sed to use extended regular expressions.

Answer 3

You can do this:

echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" | grep -o '#[a-zA-Z0-9_]\+'

You get the expected output:

#cerveau
#Einstein
#Ordre
#Mopses

Explanation: The -o option in grep:

Prints only the matching part of the lines.

So, the grep command above matches a hashtag followed by a non-zero number of alphabets, digits and underscores.

Bash sed - find hashtags in string

Question

3 answers

solution1
7 ACCPTED 2016-01-01 14:44:29

solution2
2 2016-01-01 15:44:08

solution3
1 2016-01-01 14:44:10

Bash sed - find hashtags in string

Question

3 answers

solution1 7 ACCPTED 2016-01-01 14:44:29

solution2 2 2016-01-01 15:44:08

solution3 1 2016-01-01 14:44:10

solution1
7 ACCPTED 2016-01-01 14:44:29

solution2
2 2016-01-01 15:44:08

solution3
1 2016-01-01 14:44:10