繁体   English   中英

使用R将因子转换为数值时出错

[英]Error in convert factor to numeric with R

更新:

实际上,我的问题是sum(data[,"employee_count"], na.rm = T)

我有原始数据为:

employee_count
1-49
0
150-249
1-49
1000+

我写的代码如下:

data$employee_count<- as.character.factor (data$employee_count)
data[data$employee_count=="1-49","employee_count"]<-1
data[data$employee_count=="50-149","employee_count"]<-2
data[data$employee_count=="150-249","employee_count"]<-3
data[data$employee_count=="250-499","employee_count"]<-4
data[data$employee_count=="500-749","employee_count"]<-5
data[data$employee_count=="750-999","employee_count"]<-6
data[data$employee_count=="1000+","employee_count"]<-7

然后,数据更改如下:

employee_count
"1"
"0"
"3"
"1"
"7"

然后我尝试将其更改为数字:

data$employee_count<-as.numeric(as.character(data$employee_count))

代码之后,数据更改为1 0 3 1 7 ,但是当我尝试执行sum(data$employee_count) ,输出为NA 我想这有什么问题。

理想的结果是将该列实际更改为数字,该数字可以参与任何类型的计算。

例如,如果我写了data[1,"employee_count"]+data[2,"employee_count"]

所需的结果将是1 + 0 = 1

如果我写了sum(data$employee_count)

结果应该是1 + 0 + 3 + 1 + 7 = 12

如果我写了data[3,"employee_count"]*data[4,"employee_count"]

结果应为3 * 1 = 3

sum(as.numeric(factor(data[,1], levels=unique(data[,1]))))
#[1] 6

如果您检查order

 as.numeric(factor(data[,1], levels=unique(data[,1])))
 #[1] 1 2 3

这与

 as.numeric(factor(data[,1]))
 #[1] 1 3 2

数据

data <- structure(list(employee_count = c("1-49", "50-149", "150-249"
 )), .Names = "employee_count", class = "data.frame", row.names = c(NA, 
-3L))

更新

 data <- structure(list(employee_count = c("1-49", "0", "150-249", "250-499", 
 "1-49", "500-749", "500-749", "750-999", "50-149", "1000+", "150-249"
 )), .Names = "employee_count", row.names = c(NA, -11L), class = "data.frame")


 data1 <- data

 data[,1] <- as.numeric(factor(data[,1], 
          levels=c('0', '1-49', '50-149', '150-249', '250-499', '500-749', '750-999', '1000+')))-1


 data[,1]
 #[1] 1 0 3 4 1 5 5 6 2 7 3

 data1[,1]
 #[1] "1-49"    "0"       "150-249" "250-499" "1-49"    "500-749" "500-749"
 #[8] "750-999" "50-149"  "1000+"   "150-249"

  sum(data[,1])
  #[1] 37
 data[3,"employee_count"]*data[4,"employee_count"]
 #[1] 12  #different value because I used a different data

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM