I have a block of text like this:
Hello @Simon, I had a great day today. #StackOverflow
I want to find the most elegant solution to stripping it down to look like this:
Hello, I had a great day today.
ie I want to strip out all words that have a prefix of # and @. (And yes, im inspecting tweets)
I am new to python, and I would be ok doing this on single words, but not sure on the best way to achieve this on a string that contains multiple words.
My first thoughts would be to use replace, but that would just strip out the actual @ and # symbols. Looking for the best way to strip out any word that has a prefix of # or @.
-EDIT- Not sure if it this invalidates the answers give, but for acceptance, I also need to strip out where multiple words contain a prefix of # or $. eg hello #hiya #ello
You can use regular expressions :
>>> import re
>>> s = 'Hello @Simon, I had a great day today. #StackOverflow'
>>> re.sub(r'(?:^|\s)[@#].*?(?=[,;:.!?]|\s|$)', r'', s)
'Hello, I had a great day today.'
It's as simple as writing an anonymous function and putting it in a filter statement
' '.join(filter(lambda x: x[0] not in ['@','#'], tweet.split()))
This will lose the comma on @users or #topics but if you're just processing the tweets you probably won't miss it.
' '.join([w for w in s.split() if len(w)>1 and w[0] not in ['@','#']])
当s
是你的鸣叫。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.