简体   繁体   English

if(!all(o)){错误:需要TRUE / FALSE的缺少值R

[英]Error in if (!all(o)) { : missing value where TRUE/FALSE needed R

I try to create a sentiment analysis system using word2vec and logistic multinomial regression. 我尝试使用word2vec和logistic多项式回归创建情感分析系统。

For this, I tried to do like the author did here : http://analyzecore.com/2017/02/08/twitter-sentiment-analysis-doc2vec/ 为此,我试图像作者在这里所做的那样: http : //analyzecore.com/2017/02/08/twitter-sentiment-analysis-doc2vec/

Here the R code : 这里的R代码:

library(tidyverse)
library(text2vec)
library(caret)
library(glmnet)
library(ggrepel)

Train_classifier <- read.csv('IRC.csv',header=T, sep=";")
Test_classifier <- read.csv('IRC2.csv',header=T, sep=";")

# select only 4 column of the dataframe

Train <- Train_classifier[, c("Note.Reco", "Raison.Reco", "DATE_SAISIE", "idpart")]
Test <- Test_classifier[, c("Note.Reco", "Raison.Reco", "DATE_SAISIE", "idpart")]

#delete rows with empty value columns
subTrain <- Train[rowSums(Train == '') == 0,]
subTrain$ID <- seq.int(nrow(subTrain))

 subTrain
      DATE_SAISIE    idpart    ID  Raison.Reco  Note.Reco
2      19/03/2014 102853645     1  Good         0
3      19/03/2014   1072309     2  Not good     2
4      19/03/2014    191391     3  very good    9
6      19/03/2014     14529     4  not comment  8
7      19/03/2014 100065501     5  very professional  9
8      19/03/2014 102261392     6  very good  1
9      19/03/2014 102734704     7  good  10
10     19/03/2014   1004397     8  not very good  10

# # replacing class values
subTrain$Note.Reco = ifelse(subTrain$Note.Reco >= 0 & subTrain$Note.Reco <= 4, 0, ifelse(subTrain$Note.Reco >= 5 &
subTrain$Note.Reco <= 6, 1, ifelse(subTrain$Note.Reco >= 7 & subTrain$Note.Reco <= 8, 2, 3)))


subTest <- Test[rowSums(Test == '') == 0,]
subTest$ID <- seq.int(nrow(subTest))

#Data pre processing
#Doc2Vec

prep_fun <- tolower
tok_fun <- word_tokenizer

subTrain[] <- lapply(subTrain, as.character)
it_train <- itoken(subTrain$Raison.Reco, 
                   preprocessor = prep_fun, 
                   tokenizer = tok_fun,
                   ids = subTrain$ID,
                   progressbar = TRUE)



subTest[] <- lapply(subTest, as.character)
it_test <- itoken(subTest$Raison.Reco, 
                   preprocessor = prep_fun, 
                   tokenizer = tok_fun,
                   ids = subTest$ID,
                   progressbar = TRUE)


#creation of vocabulary and term document matrix
  ### fichier  d'apprentissage
vocab_train <- create_vocabulary(it_train)
vectorizer_train <- vocab_vectorizer(vocab_train)
dtm_train <- create_dtm(it_train, vectorizer_train)


  ###  test data



vocab_test <- create_vocabulary(it_test)
vectorizer_test <- vocab_vectorizer(vocab_test)
dtm_test <- create_dtm(it_test, vectorizer_test)

##Define  tf-idf model 

tfidf <- TfIdf$new()
# fit the model to the train data and transform it with the fitted model
dtm_train_tfidf <- fit_transform(dtm_train, tfidf)
dtm_test_tfidf <- fit_transform(dtm_test, tfidf)

glmnet_classifier <- cv.glmnet(x = dtm_train_tfidf,
                               y = subTrain[['Note.Reco']], family = 'multinomial',type.multinomial = "grouped")

When I run this code I get this error : 当我运行此代码时,出现此错误:

 > glmnet_classifier <- cv.glmnet(x = dtm_train_tfidf, + y = subTrain[['Note.Reco']], family = 'multinomial',type.multinomial = "grouped") Error in if (!all(o)) { : missing value where TRUE/FALSE needed 

Any idea please? 有什么想法吗?

Thank you 谢谢

EDIT: 编辑:

The problem come from this line line @Edward Moseley said in his comment : 问题来自此行@Edward Moseley在他的评论中说:

subTrain[] <- lapply(subTrain, as.character)

But when I delete it and I move to this line : 但是,当我删除它并移至此行时:

it_train <- itoken(subTrain$Raison.Reco, 
                   preprocessor = prep_fun, 
                   tokenizer = tok_fun,
                   ids = subTrain$ID,
                   progressbar = TRUE)

I get this error : 我收到此错误:

Error in UseMethod("itoken") : 
  no applicable method for 'itoken' applied to an object of class "factor"




subTrain
> subTrain
      Note.Reco
1             3
2             3
3             2
4             3
5             3
6             1
7             3
8             1
9             2
10            3
11            3
12            3
13            3
14            2
15            2
16            3
17            3
18            2
19            3
20            2
21            2
22            2
23            0
24            0
25            2
26            3
27            3
28            0
29            0
30            2
31            3
32            3
33            3
34            3
35            0
36            1
37            2
38            1
39            3
40            3
41            3
42            1
43            3
44            2
45            3
46            3
47            2
48            3
49            3
50            2
51            1
52            1
53            2
54            3
55            3
56            2
57            2
58            3
59            2
60            1
61            3
62            0
63            2
64            2
65            3
66            0
67            1
68            3
69            2
70            2
71            3
72            2
73            2
74            2
75            3
76            2
77            2
78            3
79            3
80            3
81            3
82            2
83            2
84            1
85            0
86            2
87            0
88            3
89            3
90            3
91            2
92            1
93            2
94            1
95            3
96            3
97            2
98            2
99            3
100           3
101           0
102           2
103           2
104           0
105           2
106           3
107           2
108           2
109           2
110           3
111           3
112           2
113           2
114           2
115           3
116           3
117           2
118           3
119           3
120           3
121           3
122           2
123           3
124           2
125           2
126           0
127           3
128           3
129           0
130           3
131           0
132           1
133           3
134           2
135           0
136           1
137           3
138           1
139           3
140           3
141           3
142           2
143           2
144           3
145           2
146           2
147           3
148           1
149           1
150           3
151           2
152           2
153           3
154           2
155           3
156           2
157           3
158           3
159           3
160           0
161           2
162           1
163           3
164           3
165           1
166           2
167           2
168           3
169           2
170           3
171           3
172           3
173           2
174           2
175           3
176           3
177           0
178           3
179           2
180           3
181           0
182           3
183           3
184           2
185           3
186           3
187           1
188           3
189           1
190           2
191           2
192           3
193           3
194           3
195           2
196           2
197           3
198           2
199           0
200           3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ... <truncated>
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ... <truncated>
3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ... <truncated>
4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ... <truncated>
5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ... <truncated>
7  

I see this "truncated value" in the dataframe, is it normal? 我在数据框中看到了这个“截断的值”,这正常吗?

I think you're introducing problems when you lapply() . 我认为当您lapply()时,您正在引入问题。

Try checking: (dim(subTrain[['Note.Reco']]) > 0) 尝试检查: (dim(subTrain[['Note.Reco']]) > 0)

If that evaluates to TRUE then you may have a different problem than I describe below. 如果该结果为TRUE那么您可能会遇到与以下我描述的问题不同的问题。

From the glmnet github I am seeing: if (!all(o)) { as part of glmnet/R/lognet.R , (which I am assuming is called by cv.glmnet ). glmnet github我看到: if (!all(o)) {作为glmnet/R/lognet.R一部分(我假设它由cv.glmnet )。 in the lognet function we see: lognet函数中,我们看到:

27   weights=drop(y%*%rep(1,nc)) 
28   o=weights>0 
29   if(!all(o)){ #subset the data 

Earlier, on line 2 we see nc defined: 之前,在第2行中,我们看到了nc定义:

2   nc=dim(y)

The rest of the code depends on the values of nc , so I would suggest determining the results of dim(subTrain[['Note.Reco']]) for starters. 其余代码取决于nc的值,因此我建议为初学者确定dim(subTrain[['Note.Reco']])的结果。 You could try accessing those data differently: 您可以尝试以其他方式访问这些数据:

glmnet_classifier <- cv.glmnet(x = dtm_train_tfidf, y = subTrain$Note.Reco, family = 'multinomial',type.multinomial = "grouped")

Also, if you construct a data.frame we can copy/paste to work with it will be much easier to debug. 同样,如果您构造一个data.frame我们可以复制/粘贴它来进行调试。

I had this kind of problem too when using glmnet and text2vec. 使用glmnet和text2vec时,我也遇到这种问题。 In my case the problem was caused by missing cases. 就我而言,问题是由遗失案件引起的。 The reason is that the function all() called by cv.glment doesn't work with missing values. 原因是cv.glment调用的all()函数不适用于缺少值的情况。 To fix the problem I removed the NAs from my data. 为了解决该问题,我从数据中删除了NAs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM