我無法使用tm_map刪除•和其他一些特殊字符，例如'-

Question

我搜索所有問題，並能夠在第一組命令中替換掉•。 但是，當我申請我的語料庫時，它不起作用，•仍然出現。 語料庫有6570個元素，大小為2.3mb，因此似乎是有效的。

> x <- ". R Tutorial"
> gsub("•","",x)
[1] ". R Tutorial"

> removeSpecialChars <- function(x) gsub("•","",x)
> corpus2=tm_map(corpus2, removeSpecialChars)
> print(corpus2[[6299]][1])
[1] "• R tutorial • success– october"
> ##remove special characters

Answer 1

對於以更直接的方式與語料庫對象一起工作的替代方法呢？

require(quanteda)
require(magrittr)

corpus3 <- corpus(c("• R Tutorial", "More of these • characters •", "Tricky •!"))

# remove the character from the tokenized corpus
tokens(corpus3)
## tokens from 3 documents.
## text1 :
## [1] "R"        "Tutorial"
## 
## text2 :
## [1] "More"       "of"         "these"      "characters"
## 
## text3 :
## [1] "Tricky" "!"  
tokens(corpus3) %>% tokens_remove("•")
## tokens from 3 documents.
## [1] "R"        "Tutorial"
## text1 :
## 
## text2 :
## [1] "More"       "of"         "these"      "characters"
## 
## text3 :
## [1]] "Tricky" "!"  

# remove the character from the corpus itself
texts(corpus3) <- gsub("•", "", texts(corpus3), fixed = TRUE)
texts(corpus3)
##         text1                        text2                        text3 
## " R Tutorial" "More of these  characters "                   "Tricky !"

我無法使用tm_map刪除•和其他一些特殊字符，例如'-

問題描述

1 個解決方案

解決方案1
0 2017-03-22 16:44:35

我無法使用tm_map刪除•和其他一些特殊字符，例如&#39;-

問題描述

1 個解決方案

解決方案1 0 2017-03-22 16:44:35

我無法使用tm_map刪除•和其他一些特殊字符，例如'-

解決方案1
0 2017-03-22 16:44:35