簡體   English   中英

R插入符號包(rpart)

[英]R caret package (rpart)

使用rpart庫時出現以下錯誤

dt <- rpart(formula, method="class", data=full.df.allAttr.train);

Error in model.frame.default(formula = formula, data = full.df.allAttr.train,  : 
  object is not a matrix

當我將full.df.allAttr.train轉換為矩陣時

dt <- rpart(formula, method="class", data= as.matrix( full.df.allAttr.train));

Error in model.frame.default(formula = formula, data = as.matrix(full.df.allAttr.train),  : 
  'data' must be a data.frame, not a matrix or an array

當我檢查類時輸入其數據框

class(full.df.allAttr.train)

[1] "data.frame"

謝謝您的投入,當我使用正確的列名創建公式並產生結果時,錯誤就消失了。

measurevar <- "SpeakerName"
formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ")
formula <- as.formula(formula_str) 

由於我的數據框具有row.names,而下面的文本是快照,它給出了一個不同的錯誤

Error in model.frame.default(formula = formula, data = full.df.train,  : 
  variable lengths differ (found for 'character(0)')

在此處輸入圖片說明

抱歉,我將添加完整的源代碼和數據集

library(tm)
library(rpart)
obamaCorpus <- Corpus(DirSource(directory = "D:/R/Chap 6/Speeches/obama" , encoding="UTF-8"))
romneyCorpus <- Corpus(DirSource(directory = "D:/R/Chap 6/Speeches/romney" , encoding="UTF-8"))
fullCorpus <- c(obamaCorpus,romneyCorpus)#1-22 (obama), 23-44(romney)
fullCorpus.cleansed <- tm_map(fullCorpus, removePunctuation)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, stripWhitespace)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, tolower)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, removeWords, stopwords("english"))
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, PlainTextDocument)
#fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, stemDocument)

full.dtm <- DocumentTermMatrix(fullCorpus.cleansed)
full.dtm.spars <- removeSparseTerms(full.dtm , 0.6)

full.matix <- data.matrix(full.dtm.spars)
full.df <- as.data.frame(full.matix)

full.df[,"SpeakerName"] <- "obama"
full.df$SpeakerName[21:44] <- "romney"

train.idx <- sample(nrow(full.df) , ceiling(nrow(full.df)* 0.6))
test.idx <- (1:nrow(full.df))[-train.idx]
rowNames <- colnames(full.df)

measurevar <- "SpeakerName"
formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ")
formula <- as.formula(formula_str)
dt <- rpart(formula, method="class", data=full.df.train);

在最后一步失敗

數據集在這里https://drive.google.com/folderview?id=0B1SogodTE-kJSHF6aFRmQURsV0U&usp=sharing

您忘記包含full.df.train並且公式不正確。

這將起作用:

full.df.train <- full.df[train.idx, ]
dt <- rpart(SpeakerName ~ ., method = "class", data = full.df.train)

公式的問題是~兩側都包含SpeakerName 如果要使用所有變量,請使用. 表達更加容易和緊湊。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM