简体   繁体   中英

How to extract all words from tweet using single regex in python?

I tried below code but this expression finds just one word please suggest some way to get expected output.

import re
tweet = 'RT @marcobonzanini: just an example! :D http://example.com #NLP'
re.findall('(^[a-zA-z]+)\s',tweet)

output : ['RT']
expected output =['RT','just','an','example']
so basically remove websites @ mentions,# hash-tags,emotions

I could match all the cases in your input data by using

((RT)|(@[a-z:]*)|(http:\/\/[a-z.]*)|(#[a-zA-Z]*)|( )|(!))|(:D)

Here's the live preview: https://regex101.com/r/xKeFOa/1

Let me know if it helps you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM