如何將因子的數據幀轉換為數字？

Question

我有一個包含所有因子值的數據框

V1 V2 V3
 a  b  c
 c  b  a
 c  b  c
 b  b  a

如何將數據框中的所有值轉換為具有數值的新值（a到1，b到2，c到3等等）

Answer 1

我會嘗試：

> mydf[] <- as.numeric(factor(as.matrix(mydf)))
> mydf
  V1 V2 V3
1  1  2  3
2  3  2  1
3  3  2  3
4  2  2  1

Answer 2

從factor轉換為numeric給出整數值。 但是，如果factor列的級別指定為c('b', 'a', 'c', 'd')或c('c', 'b', 'a') ，則整數值將在那個命令。 為了避免這種情況，我們可以通過再次調用factor來指定levels （更安全）

df1[] <- lapply(df1, function(x) 
                as.numeric(factor(x, levels=letters[1:3])))

如果我們使用data.table ，一個選項是使用set 。 對於大型數據集，它會更有效。 轉換為matrix可能會造成內存問題。

library(data.table)
setDT(df1)
for(j in seq_along(df1)){
 set(df1, i=NULL, j=j, 
     value= as.numeric(factor(df1[[j]], levels= letters[1:3])))
 }

Answer 3

這種方法類似於Ananda，但使用unlist()而不是factor(as.matrix()) 。 由於所有列都已經是因子，因此unlist()會將它們組合成一個具有適當級別的因子向量。

那么讓我們來看看當我們unlist()數據框時會發生什么。

unlist(df, use.names = FALSE)
#  [1] a c c b b b b b c a c a
# Levels: a b c

現在我們可以在上面的代碼上運行as.integer() （或c() ），因為因子的整數值與您想要的映射匹配。 因此，以下內容將重新評估您的整個數據框架。

df[] <- as.integer(unlist(df, use.names = FALSE))
## note that you can also just drop the factor class with c()
## df[] <- c(unlist(df, use.names = FALSE))
df
#   V1 V2 V3
# 1  1  2  3
# 2  3  2  1
# 3  3  2  3
# 4  2  2  1

注意： use.names = FALSE不是必需的。 但是，刪除names屬性將使此過程更有效。

數據：

df <- structure(list(V1 = structure(c(1L, 3L, 3L, 2L), .Label = c("a", 
"b", "c"), class = "factor"), V2 = structure(c(1L, 1L, 1L, 1L
), .Label = "b", class = "factor"), V3 = structure(c(2L, 1L, 
2L, 1L), .Label = c("a", "c"), class = "factor")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, -4L))

如何將因子的數據幀轉換為數字？

問題描述

3 個解決方案

解決方案1
11 2016-01-01 15:30:40

解決方案2
10 2016-01-01 15:27:02

解決方案3
5 2016-01-01 16:15:43

如何將因子的數據幀轉換為數字？

問題描述

3 個解決方案

解決方案1 11 2016-01-01 15:30:40

解決方案2 10 2016-01-01 15:27:02

解決方案3 5 2016-01-01 16:15:43

解決方案1
11 2016-01-01 15:30:40

解決方案2
10 2016-01-01 15:27:02

解決方案3
5 2016-01-01 16:15:43