使用 Python 正则表达式处理 Twitter 数据

Question

I'm filtering tweets in my application and want to return all tweets that either have a certain word in the text.我正在我的应用程序中过滤推文，并希望返回文本中包含某个单词的所有推文。 So if I am filtering BBC and I want all instances of BBC eg.因此，如果我正在过滤 BBC 并且我想要 BBC 的所有实例，例如。 BBC, bbc, BBC1, #BBC, @bbc, how could I write the regex. BBC、BBC、BBC1、#BBC、@bbc，我怎么能写正则表达式。

So far I'm doing:到目前为止，我正在做：

re.compile(r'#|@[0-9]'+term, re.IGNORECASE)

Term is a list containing words and I want returned only those words in the list with the extra @ or # or 0-9 prepending or appending that word OR the word by itself. Term 是一个包含单词的列表，我只想返回列表中带有额外的 @ 或 # 或 0-9 预先或附加该单词或单词本身的那些单词。

Thanks谢谢

Answer 1

使用'\\b'分隔符查找整个单词：

re.compile(r'\b(?:#|@|)[0-9]*%s[0-9]*\b' % re.escape(term), re.IGNORECASE)

使用 Python 正则表达式处理 Twitter 数据

问题描述

1 个解决方案

解决方案1
2 已采纳 2012-11-16 23:13:24

使用 Python 正则表达式处理 Twitter 数据

问题描述

1 个解决方案

解决方案1 2 已采纳 2012-11-16 23:13:24

解决方案1
2 已采纳 2012-11-16 23:13:24