简体   繁体   English

从字符串中删除带有#或@前缀的单词?

[英]Remove words from string that have a prefix of # or @?

I have a block of text like this: 我有一段这样的文字:

Hello @Simon, I had a great day today. #StackOverflow

I want to find the most elegant solution to stripping it down to look like this: 我想找到最优雅的解决方案,将其剥离如下所示:

Hello, I had a great day today.

ie I want to strip out all words that have a prefix of # and @. 即我想删除所有带有#和@前缀的单词。 (And yes, im inspecting tweets) (是的,即时通讯正在检查推文)

I am new to python, and I would be ok doing this on single words, but not sure on the best way to achieve this on a string that contains multiple words. 我是python的新手,我可以对单个单词进行此操作,但不确定在包含多个单词的字符串上实现此效果的最佳方法。

My first thoughts would be to use replace, but that would just strip out the actual @ and # symbols. 我的第一个想法是使用replace,但这只会去除实际的@和#符号。 Looking for the best way to strip out any word that has a prefix of # or @. 寻找最好的方法来删除任何带有#或@前缀的单词。

-EDIT- Not sure if it this invalidates the answers give, but for acceptance, I also need to strip out where multiple words contain a prefix of # or $. -EDIT-不知道这是否会使给出的答案无效,但是为了接受,我还需要去除多个单词包含#或$前缀的地方。 eg hello #hiya #ello 例如,你好#hiya #ello

You can use regular expressions : 您可以使用正则表达式

>>> import re
>>> s = 'Hello @Simon, I had a great day today. #StackOverflow'
>>> re.sub(r'(?:^|\s)[@#].*?(?=[,;:.!?]|\s|$)', r'', s)
'Hello, I had a great day today.'

It's as simple as writing an anonymous function and putting it in a filter statement 就像编写匿名函数并将其放入过滤器语句一样简单

' '.join(filter(lambda x: x[0] not in ['@','#'], tweet.split()))

This will lose the comma on @users or #topics but if you're just processing the tweets you probably won't miss it. 这将失去@users或#topics上的逗号,但是如果您只处理推文,则可能不会错过。

' '.join([w for w in s.split() if len(w)>1 and w[0] not in ['@','#']])

s是你的鸣叫。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM