简体   繁体   中英

Error in convert factor to numeric with R

Update:

acturally, my problem is sum(data[,"employee_count"], na.rm = T)

I have original data as:

employee_count
1-49
0
150-249
1-49
1000+

I wrote code as following:

data$employee_count<- as.character.factor (data$employee_count)
data[data$employee_count=="1-49","employee_count"]<-1
data[data$employee_count=="50-149","employee_count"]<-2
data[data$employee_count=="150-249","employee_count"]<-3
data[data$employee_count=="250-499","employee_count"]<-4
data[data$employee_count=="500-749","employee_count"]<-5
data[data$employee_count=="750-999","employee_count"]<-6
data[data$employee_count=="1000+","employee_count"]<-7

Then the data is changed as following:

employee_count
"1"
"0"
"3"
"1"
"7"

Then I try to change it to numeric:

data$employee_count<-as.numeric(as.character(data$employee_count))

Data is changed to 1 0 3 1 7 after the code, but when I tried to do sum(data$employee_count) , and the output is NA . I suppose there is something wrong.

The desired result is to actually changed this column to numbers, which can be involved in any kind of calculation.

For example, if I wrote data[1,"employee_count"]+data[2,"employee_count"] ,

the desired result will be 1+0 = 1 .

If I wrote sum(data$employee_count) ,

the result should be 1+0+3+1+7= 12 .

If I wrote data[3,"employee_count"]*data[4,"employee_count"]

the result should be 3*1= 3 .

sum(as.numeric(factor(data[,1], levels=unique(data[,1]))))
#[1] 6

If you check the order

 as.numeric(factor(data[,1], levels=unique(data[,1])))
 #[1] 1 2 3

which is not the same as

 as.numeric(factor(data[,1]))
 #[1] 1 3 2

data

data <- structure(list(employee_count = c("1-49", "50-149", "150-249"
 )), .Names = "employee_count", class = "data.frame", row.names = c(NA, 
-3L))

Update

 data <- structure(list(employee_count = c("1-49", "0", "150-249", "250-499", 
 "1-49", "500-749", "500-749", "750-999", "50-149", "1000+", "150-249"
 )), .Names = "employee_count", row.names = c(NA, -11L), class = "data.frame")


 data1 <- data

 data[,1] <- as.numeric(factor(data[,1], 
          levels=c('0', '1-49', '50-149', '150-249', '250-499', '500-749', '750-999', '1000+')))-1


 data[,1]
 #[1] 1 0 3 4 1 5 5 6 2 7 3

 data1[,1]
 #[1] "1-49"    "0"       "150-249" "250-499" "1-49"    "500-749" "500-749"
 #[8] "750-999" "50-149"  "1000+"   "150-249"

  sum(data[,1])
  #[1] 37
 data[3,"employee_count"]*data[4,"employee_count"]
 #[1] 12  #different value because I used a different data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM