[英]How to convert all factor variables into numeric variables in a large data frame without loosing variables labels?
I am trying to convert all factor variables to numeric variables in a large data frame.我正在尝试将大型数据框中的所有因子变量转换为数值变量。 While converting, variable labels (elaborative name of variables) are lost in the new data frame.
转换时,变量标签(变量的详细名称)在新数据框中丢失。 Is there any easy way to covert factor variables into numeric variables in a data frame without losing variable.
有没有什么简单的方法可以在不丢失变量的情况下将因子变量转换为数据框中的数值变量。 The sample code is given below.
下面给出了示例代码。 Thank you.
谢谢你。
v1 <- c('1','4','5')
v2 <- c('21000', '23400', '26800')
v3 <- c('2010','2008','2007')
data <- data.frame(v1, v2, v3)
library(Hmisc)
label(data$v1) <- "Number"
label (data$v2) <- "Value"
label (data$v3) <- "Year"
data[] <- as.numeric(factor(as.matrix(data)))
View(data)
You could save the attributes beforehand and restore them.您可以预先保存属性并恢复它们。
## save labels
attr.data <- lapply(dat, attr, "label")
## convert to numeric and restore labels
dat[] <- Map(function(x, y) `attr<-`(as.numeric(levels(x))[x], "label", y), dat, attr.data)
In one step:一步:
dat[] <- Map(function(x, y)
`attr<-`(as.numeric(levels(x))[x], "label", y), dat, Map(attr, dat, "label"))
The labels are stored in attributes (try attributes(data)
) and can be accessed with attr
and their names.标签存储在属性(try
attributes(data)
)中,可以使用attr
及其名称进行访问。 The name of label attributes is "label"
and we can catch them during conversion.标签属性的名称是
"label"
,我们可以在转换过程中捕获它们。 Map
handles columns and attributes in a corresponding manner to ensure that the correct labels are assigned. Map
以相应的方式处理列和属性,以确保分配正确的标签。
dat
# v1 v2 v3
# 1 1 21000 2010
# 2 4 23400 2008
# 3 5 26800 2007
str(dat)
# 'data.frame': 3 obs. of 3 variables:
# $ v1: num 1 4 5
# ..- attr(*, "label")= chr "Number"
# $ v2: num 21000 23400 26800
# ..- attr(*, "label")= chr "Value"
# $ v3: num 2010 2008 2007
# ..- attr(*, "label")= chr "Year"
Data数据
dat <- structure(list(v1 = structure(1:3, .Label = c("1", "4", "5"), class = c("labelled",
"factor"), label = "Number"), v2 = structure(1:3, .Label = c("21000",
"23400", "26800"), class = c("labelled", "factor"), label = "Value"),
v3 = structure(3:1, .Label = c("2007", "2008", "2010"), class = c("labelled",
"factor"), label = "Year")), row.names = c(NA, -3L), class = "data.frame")
Sidenote: I use dat
rather than data
here, because data
is already occupied from R to load specific datasets.旁注:我在这里使用
dat
而不是data
,因为data
已经被 R 占用来加载特定的数据集。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.