[英]Create New Data Frame with Column Names from Unique Values in another Data Frame and Corresponding Values Assigned to Column
I'm new to R, and I'm pretty sure this is something simple to accomplish, but I cannot figure out how to perform this action. 我是R的新手,我很确定这很容易实现,但我无法弄清楚如何执行此操作。 I've tried the split function, utilizing a for loop, but cannot quite figure out how to get it right. 我已经尝试了使用for循环的split功能,但无法弄清楚如何正确使用它。 As an example, this is what my original data frame looks like: 例如,这就是我原始数据框的样子:
dat <- data.frame(col1 = c(rep("red", 4), rep("blue", 3)), col2 = c(1, 3, 2, 4, 7, 8, 9))
col1 col2
red 1
red 3
red 2
red 4
blue 7
blue 8
blue 9
I want to create new columns for each unique value in col1 and assign it's corressponding value in col2 to the new data frame. 我想为col1中的每个唯一值创建新列,并将它在col2中的corressponding值分配给新数据框。 And this is how I want my new data frame: 这就是我想要的新数据框架:
red blue
1 7
3 8
2 9
4 NA
I've gotten close with a list structure close to what I wanted, but I need a data frame to boxplot and dotplot the results. 我已经接近了一个接近我想要的列表结构,但是我需要一个数据框来进行boxplot和dotplot结果。 Any help would be appriciated. 任何帮助都会得到满足。 Thanks! 谢谢!
I'm sure there's a more efficient solution, but here's one option 我确信这是一个更有效的解决方案,但这里有一个选择
dat <- data.frame(col1 = c(rep("red", 4), rep("blue", 3)), col2 = c(1, 3, 2, 4, 7, 8, 9))
dat
col1 col2
1 red 1
2 red 3
3 red 2
4 red 4
5 blue 7
6 blue 8
7 blue 9
ust <- unstack(dat, form = col2 ~ col1)
res <- data.frame(sapply(ust, '[', 1:max(unlist(lapply(ust, length)))))
res
blue red
1 7 1
2 8 3
3 9 2
4 NA 4
Edit: If you want the column order red then blue 编辑:如果您希望列顺序为红色,则为蓝色
res[, c("red", "blue")]
red blue
1 1 7
2 3 8
3 2 9
4 4 NA
Here's an Hadleyverse possible solution 这是一个Hadleyverse可能的解决方案
library(tidyr)
library(dplyr)
dat %>%
group_by(col1) %>%
mutate(n = row_number()) %>%
spread(col1, col2)
# Source: local data frame [4 x 3]
#
# n blue red
# 1 1 7 1
# 2 2 8 3
# 3 3 9 2
# 4 4 NA 4
Or using data.table
或者使用data.table
library(data.table)
dcast(setDT(dat)[, indx := 1:.N, by = col1], indx ~ col1, value.var = "col2")
# indx blue red
# 1: 1 7 1
# 2: 2 8 3
# 3: 3 9 2
# 4: 4 NA 4
Just to show another option using base R *apply
and cbind
只是为了显示使用base R *apply
和cbind
另一个选项
# split the data into list using col1 column
tmp.list = lapply(split(dat, dat$col1), function(x) x$col2)
# identify the length of the biggest list
max.length = max(sapply(tmp.list, length))
# combine the list elements, while filling NA for the missing values
data.frame(do.call(cbind,
lapply(tmp.list, function(x) c(x, rep(NA, max.length - length(x))))
))
# blue red
#1 7 1
#2 8 3
#3 9 2
#4 NA 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.