I tried below code but this expression finds just one word please suggest some way to get expected output.
import re
tweet = 'RT @marcobonzanini: just an example! :D http://example.com #NLP'
re.findall('(^[a-zA-z]+)\s',tweet)
output : ['RT']
expected output =['RT','just','an','example']
so basically remove websites @ mentions,# hash-tags,emotions
I could match all the cases in your input data by using
((RT)|(@[a-z:]*)|(http:\/\/[a-z.]*)|(#[a-zA-Z]*)|( )|(!))|(:D)
Here's the live preview: https://regex101.com/r/xKeFOa/1
Let me know if it helps you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.