从R中的推文中删除适当的英语单词

Question

I'm working on twitter data using R and am trying to remove all proper English words from the tweet. 我正在使用R处理Twitter数据，并试图从推文中删除所有正确的英语单词。 The idea is to look at the colloquial abbreviations, typos and slang used by a particular demographic whose tweets I have recorded. 我的想法是查看由我记录其推文的特定人群所使用的口语缩写，错别字和语。

Example: 例：

    tweet <- c("Trying to find the solution frustrated af")

After the above mentioned operation, I would like to have only 'af' 经过上述操作后，我只想拥有“ af”

I thought of washing the tweets against a dictionary (which I will download) but there must be a simpler alternative. 我想到了用字典（我将下载）清洗这些推文，但是必须有一个更简单的选择。 Any solution in Python would also help. Python中的任何解决方案也将有所帮助。

Answer 1

Another hunspell based solution using a rather new & interesting package : 另一个基于hunspell的解决方案，使用了一个相当有趣的新软件包：

# install.packages("hunspell") # uncomment & run if needed
library(hunspell)
tweet <- c("Trying to find the solution frustrated af")
( tokens <- strsplit(tweet, " ")[[1]] )
# [1] "Trying"     "to"         "find"       "the"        "solution"   "frustrated" "af"        
tokens[!hunspell_check(tokens), dict = "en_US"]
# [1] "af"

从R中的推文中删除适当的英语单词

问题描述

1 个解决方案

解决方案1
0 2016-07-14 13:08:15

从R中的推文中删除适当的英语单词

问题描述

1 个解决方案

解决方案1 0 2016-07-14 13:08:15

解决方案1
0 2016-07-14 13:08:15