简体   繁体   中英

How to convert all factor variables into numeric variables in a large data frame without loosing variables labels?

I am trying to convert all factor variables to numeric variables in a large data frame. While converting, variable labels (elaborative name of variables) are lost in the new data frame. Is there any easy way to covert factor variables into numeric variables in a data frame without losing variable. The sample code is given below. Thank you.

v1 <- c('1','4','5')
v2 <- c('21000', '23400', '26800')
v3 <- c('2010','2008','2007')
data <- data.frame(v1, v2, v3)
library(Hmisc)
label(data$v1) <- "Number"
label (data$v2) <- "Value"
label (data$v3) <- "Year"

data[] <- as.numeric(factor(as.matrix(data)))
View(data)

You could save the attributes beforehand and restore them.

## save labels
attr.data <- lapply(dat, attr, "label")  

## convert to numeric and restore labels
dat[] <- Map(function(x, y) `attr<-`(as.numeric(levels(x))[x], "label", y), dat, attr.data)

In one step:

dat[] <- Map(function(x, y) 
  `attr<-`(as.numeric(levels(x))[x], "label", y), dat, Map(attr, dat, "label"))

Explanation

The labels are stored in attributes (try attributes(data) ) and can be accessed with attr and their names. The name of label attributes is "label" and we can catch them during conversion. Map handles columns and attributes in a corresponding manner to ensure that the correct labels are assigned.

Result

dat
#   v1    v2   v3
# 1  1 21000 2010
# 2  4 23400 2008
# 3  5 26800 2007

str(dat)
# 'data.frame': 3 obs. of  3 variables:
#   $ v1: num  1 4 5
# ..- attr(*, "label")= chr "Number"
# $ v2: num  21000 23400 26800
# ..- attr(*, "label")= chr "Value"
# $ v3: num  2010 2008 2007
# ..- attr(*, "label")= chr "Year"

Data

dat <- structure(list(v1 = structure(1:3, .Label = c("1", "4", "5"), class = c("labelled", 
"factor"), label = "Number"), v2 = structure(1:3, .Label = c("21000", 
"23400", "26800"), class = c("labelled", "factor"), label = "Value"), 
    v3 = structure(3:1, .Label = c("2007", "2008", "2010"), class = c("labelled", 
    "factor"), label = "Year")), row.names = c(NA, -3L), class = "data.frame")

Sidenote: I use dat rather than data here, because data is already occupied from R to load specific datasets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM