[英]r collapsing data from multiple columns into one
I know there are many questions on this topic so I apologize if this is a duplicate question. 我知道关于这个主题有很多问题,所以如果这是一个重复的问题我会道歉。 I'm trying to collapse multiple columns in a data set into one column:
我正在尝试将数据集中的多个列折叠为一列:
Assuming this is the structure of the dataset I am working with, 假设这是我正在使用的数据集的结构,
df <- data.frame(
cbind(
variable_1 = c('Var1', NA, NA,'Var1'),
variable_2 = c('Var2', 'No', NA, NA),
variable_3 = c(NA, NA, 'Var3', NA),
variable_4 = c(NA, 'Var4', NA, NA),
variable_5 = c(NA, 'No', 'Var5', NA),
variable_6 = c(NA, NA, 'Var6', NA)
))
variable_1 variable_2 variable_3 variable_4 variable_5 variable_6
Var1 Var2 NA NA NA NA
NA No NA Var4 No NA
NA NA Var3 NA Var5 Var6
Var1 NA NA NA NA NA
What I am expecting is a one column variable_7
like this 我期待的是像这样的一列
variable_7
variable_1 variable_2 variable_3 variable_4 variable_5 variable_6 variable_7
Var1 Var2 NA NA NA NA Var1, Var2
NA No NA Var4 No NA Var4
NA NA Var3 NA Var5 Var6 Var3, Var5, Var6
Var1 NA NA NA NA NA Var1
Any help on accomplishing this is much appreciated. 任何帮助实现这一点非常感谢。
df$variable_7 <- apply(df, 1, function(x) paste(x[!is.na(x) & x != "No"], collapse = ", "));
df;
# variable_1 variable_2 variable_3 variable_4 variable_5 variable_6
#1 Var1 Var2 <NA> <NA> <NA> <NA>
#2 <NA> No <NA> Var4 No <NA>
#3 <NA> <NA> Var3 <NA> Var5 Var6
#4 Var1 <NA> <NA> <NA> <NA> <NA>
# variable_7
#1 Var1, Var2
#2 Var4
#3 Var3, Var5, Var6
#4 Var1
Explanation: Use apply
and paste(..., collapse = ", ")
to concatenate all row entries (except NA
s and "No"
s) and store in new column variable_7
. 说明:使用
apply
和paste(..., collapse = ", ")
连接所有行条目( NA
和"No"
除外)并存储在新列variable_7
。
df <- data.frame(
cbind(
variable_1 = c('Var1', NA, NA,'Var1'),
variable_2 = c('Var2', 'No', NA, NA),
variable_3 = c(NA, NA, 'Var3', NA),
variable_4 = c(NA, 'Var4', NA, NA),
variable_5 = c(NA, 'No', 'Var5', NA),
variable_6 = c(NA, NA, 'Var6', NA)
))
I gather that if there are n rows then objective is to create a an n-vector of comma-separated character strings of those values in each row that contain the characters Var
. 我想,如果有n行,那么objective就是在每行中创建一个包含字符
Var
的逗号分隔字符串的n向量。 (If you intended some other criterion for separating the desired and undesired values then change the grep
accordingly.) (如果您打算使用其他标准来分隔所需和不需要的值,则相应地更改
grep
。)
apply(df, 1, function(x) toString(grep("Var", x, value = TRUE)))
## [1] "Var1, Var2" "Var4" "Var3, Var5, Var6" "Var1"
Using a data.table
'reshap'-ing approach rather than a loop/apply 使用
data.table
'重新data.table
'方法而不是循环/应用
library(data.table)
setDT(df)
df[, id := .I][
melt(df, id.vars = "id")[grepl("Var", value), .(variable_7 = paste0(value, collapse = ",")), by = .(id)]
, on = "id"
, nomatch = 0
][order(id)]
# variable_1 variable_2 variable_3 variable_4 variable_5 variable_6 id variable_7
# 1: Var1 Var2 NA NA NA NA 1 Var1,Var2
# 2: NA No NA Var4 No NA 2 Var4
# 3: NA NA Var3 NA Var5 Var6 3 Var3,Var5,Var6
# 4: Var1 NA NA NA NA NA 4 Var1
A solution using dplyr
. 使用
dplyr
的解决方案。 df4
is the final output. df4
是最终输出。 Please see how I created the data frame df
. 请看我是如何创建数据框
df
。 The cbind
is not required, and it would be great to add stringsAsFactors = FALSE
to prevent the creation of factor columns. cbind
不是必需的,添加stringsAsFactors = FALSE
以防止创建因子列会很棒。
library(dplyr)
library(tidyr)
df2 <- df %>% mutate(ID = 1:n())
df3 <- df2 %>%
gather(Variable, Value, -ID, na.rm = TRUE) %>%
filter(!Value %in% "No") %>%
group_by(ID) %>%
summarise(variable_7 = toString(Value))
df4 <- df2 %>%
left_join(df3, by = "ID") %>%
select(-ID)
df4
# variable_1 variable_2 variable_3 variable_4 variable_5 variable_6 variable_7
# 1 Var1 Var2 <NA> <NA> <NA> <NA> Var1, Var2
# 2 <NA> No <NA> Var4 No <NA> Var4
# 3 <NA> <NA> Var3 <NA> Var5 Var6 Var3, Var5, Var6
# 4 Var1 <NA> <NA> <NA> <NA> <NA> Var1
DATA 数据
df <- data.frame(
variable_1 = c('Var1', NA, NA,'Var1'),
variable_2 = c('Var2', 'No', NA, NA),
variable_3 = c(NA, NA, 'Var3', NA),
variable_4 = c(NA, 'Var4', NA, NA),
variable_5 = c(NA, 'No', 'Var5', NA),
variable_6 = c(NA, NA, 'Var6', NA),
stringsAsFactors = FALSE
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.