简体   繁体   中英

Turning columns in a data.frame (or vectors) into factors

I have a recurrent problem. I often have multiple vectors or columns in a data.frame representing conditions. For example:

 condition_1 condition_2 condition_3
 5.3         2.6         1.2
 25.5        2.2         1.4
 13.1        0.1         9.2
 ...

Often I want to compare these conditions using an ANOVA. However, most ANOVA functions need the data to be specified as factors, like this:

value condition
5.3   condition_1
25.5  condition_1
13.1  condition_1
2.6   condition_2
2.2   condition_2
0.1   condition_2
1.2   condition_3
1.4   condition_3
9.2   condition_3
...

Is there a fast and easy way in R for converting from the former to the latter formatting?

Sure. You can use stack . It's not necessarily "fast" but it sure is easy.

stack(df)
#   values         ind
# 1    5.3 condition_1
# 2   25.5 condition_1
# 3   13.1 condition_1
# 4    2.6 condition_2
# 5    2.2 condition_2
# 6    0.1 condition_2
# 7    1.2 condition_3
# 8    1.4 condition_3
# 9    9.2 condition_3
sapply(stack(df), class)
#    values       ind 
# "numeric"  "factor" 

where df is

structure(list(condition_1 = c(5.3, 25.5, 13.1), condition_2 = c(2.6, 
2.2, 0.1), condition_3 = c(1.2, 1.4, 9.2)), .Names = c("condition_1", 
"condition_2", "condition_3"), class = "data.frame", row.names = c(NA, 
-3L))

Alternate approach with melt from reshape2 :

dat <- read.table(text="condition_1 condition_2 condition_3
5.3         2.6         1.2
25.5        2.2         1.4
13.1        0.1         9.2", stringsAs=FALSE, header=TRUE)

library(reshape2)

dat_m <- melt(dat)
dat_m

##      variable value
## 1 condition_1   5.3
## 2 condition_1  25.5
## 3 condition_1  13.1
## 4 condition_2   2.6
## 5 condition_2   2.2
## 6 condition_2   0.1
## 7 condition_3   1.2
## 8 condition_3   1.4
## 9 condition_3   9.2

str(dat_m)

## 'data.frame': 9 obs. of  2 variables:
##  $ variable: Factor w/ 3 levels "condition_1",..: 1 1 1 2 2 2 3 3 3
##  $ value   : num  5.3 25.5 13.1 2.6 2.2 0.1 1.2 1.4 9.2

Or using the new tidyr package

library(tidyr)
gather(dat, condition, value, condition_1:condition_3)
#     condition value
# 1 condition_1   5.3
# 2 condition_1  25.5
# 3 condition_1  13.1
# 4 condition_2   2.6
# 5 condition_2   2.2
# 6 condition_2   0.1
# 7 condition_3   1.2
# 8 condition_3   1.4
# 9 condition_3   9.2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM