Based on this post , I have tried to come up with a command to find all hashtags words (words starting by #) in a quite complicated string:
echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" | sed -e 's/^/ /g' -e 's/ [^#][^ ]*//g' -e 's/^ *//g'
Unfortunately the output is:
#cerveau #Mopses"
instead of:
#cerveau #Einstein #Ordre #Mopses
What should be the correct command?
grep
is usually better at extracting substrings. With the GNU-grep's -o
option (only output the matching parts), you can just
echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" \
| grep -o '#[[:alpha:]]*'
If you really need sed
, do the similar thing: replace all words that don't start with a #
by a space, then remove the first word and compact the spaces:
sed -e 's/[^[:alpha:]#][[:alpha:]]*/ /g' \
-e 's/^[^#]*//' \
-e 's/ */ /g'
If you want to use sed
, you can separate out all words that start by a \\n
and then find them:
echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" \
| sed -re 's/(#\w+)/\n\1\n/g' \
| sed -rn '/^(#\w+)$/p'
You need the -r
option in sed
to use extended regular expressions.
You can do this:
echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" | grep -o '#[a-zA-Z0-9_]\+'
You get the expected output:
#cerveau
#Einstein
#Ordre
#Mopses
Explanation: The -o
option in grep:
Prints only the matching part of the lines.
So, the grep
command above matches a hashtag followed by a non-zero number of alphabets, digits and underscores.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.