简体   繁体   English

如何在不丢失变量标签的情况下将大型数据框中的所有因子变量转换为数值变量?

[英]How to convert all factor variables into numeric variables in a large data frame without loosing variables labels?

I am trying to convert all factor variables to numeric variables in a large data frame.我正在尝试将大型数据框中的所有因子变量转换为数值变量。 While converting, variable labels (elaborative name of variables) are lost in the new data frame.转换时,变量标签(变量的详细名称)在新数据框中丢失。 Is there any easy way to covert factor variables into numeric variables in a data frame without losing variable.有没有什么简单的方法可以在不丢失变量的情况下将因子变量转换为数据框中的数值变量。 The sample code is given below.下面给出了示例代码。 Thank you.谢谢你。

v1 <- c('1','4','5')
v2 <- c('21000', '23400', '26800')
v3 <- c('2010','2008','2007')
data <- data.frame(v1, v2, v3)
library(Hmisc)
label(data$v1) <- "Number"
label (data$v2) <- "Value"
label (data$v3) <- "Year"

data[] <- as.numeric(factor(as.matrix(data)))
View(data)

You could save the attributes beforehand and restore them.您可以预先保存属性并恢复它们。

## save labels
attr.data <- lapply(dat, attr, "label")  

## convert to numeric and restore labels
dat[] <- Map(function(x, y) `attr<-`(as.numeric(levels(x))[x], "label", y), dat, attr.data)

In one step:一步:

dat[] <- Map(function(x, y) 
  `attr<-`(as.numeric(levels(x))[x], "label", y), dat, Map(attr, dat, "label"))

Explanation解释

The labels are stored in attributes (try attributes(data) ) and can be accessed with attr and their names.标签存储在属性(try attributes(data) )中,可以使用attr及其名称进行访问。 The name of label attributes is "label" and we can catch them during conversion.标签属性的名称是"label" ,我们可以在转换过程中捕获它们。 Map handles columns and attributes in a corresponding manner to ensure that the correct labels are assigned. Map以相应的方式处理列和属性,以确保分配正确的标签。

Result结果

dat
#   v1    v2   v3
# 1  1 21000 2010
# 2  4 23400 2008
# 3  5 26800 2007

str(dat)
# 'data.frame': 3 obs. of  3 variables:
#   $ v1: num  1 4 5
# ..- attr(*, "label")= chr "Number"
# $ v2: num  21000 23400 26800
# ..- attr(*, "label")= chr "Value"
# $ v3: num  2010 2008 2007
# ..- attr(*, "label")= chr "Year"

Data数据

dat <- structure(list(v1 = structure(1:3, .Label = c("1", "4", "5"), class = c("labelled", 
"factor"), label = "Number"), v2 = structure(1:3, .Label = c("21000", 
"23400", "26800"), class = c("labelled", "factor"), label = "Value"), 
    v3 = structure(3:1, .Label = c("2007", "2008", "2010"), class = c("labelled", 
    "factor"), label = "Year")), row.names = c(NA, -3L), class = "data.frame")

Sidenote: I use dat rather than data here, because data is already occupied from R to load specific datasets.旁注:我在这里使用dat而不是data ,因为data已经被 R 占用来加载特定的数据集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM