简体   繁体   中英

Remove words from string that have a prefix of # or @?

I have a block of text like this:

Hello @Simon, I had a great day today. #StackOverflow

I want to find the most elegant solution to stripping it down to look like this:

Hello, I had a great day today.

ie I want to strip out all words that have a prefix of # and @. (And yes, im inspecting tweets)

I am new to python, and I would be ok doing this on single words, but not sure on the best way to achieve this on a string that contains multiple words.

My first thoughts would be to use replace, but that would just strip out the actual @ and # symbols. Looking for the best way to strip out any word that has a prefix of # or @.

-EDIT- Not sure if it this invalidates the answers give, but for acceptance, I also need to strip out where multiple words contain a prefix of # or $. eg hello #hiya #ello

You can use regular expressions :

>>> import re
>>> s = 'Hello @Simon, I had a great day today. #StackOverflow'
>>> re.sub(r'(?:^|\s)[@#].*?(?=[,;:.!?]|\s|$)', r'', s)
'Hello, I had a great day today.'

It's as simple as writing an anonymous function and putting it in a filter statement

' '.join(filter(lambda x: x[0] not in ['@','#'], tweet.split()))

This will lose the comma on @users or #topics but if you're just processing the tweets you probably won't miss it.

' '.join([w for w in s.split() if len(w)>1 and w[0] not in ['@','#']])

s是你的鸣叫。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM