[英]How can you gather() multiple columns at the same time in dplyr (R)?
I am trying to gather untidy data from wide to long format.我正在尝试从宽格式到长格式收集杂乱的数据。 I have 748 variables, that need to be condensed to approximately 30.
我有 748 个变量,需要压缩到大约 30 个。
In this post , I asked: how to tidy my wide data?在这篇文章中,我问:如何整理我的宽数据? The answer: use gather().
答案:使用gather()。
However, I am still struggling to gather multiple columns and was hoping you could pinpoint where I'm going wrong.但是,我仍在努力收集多个列,并希望您能指出我哪里出错了。
Reproducible example:可重现的例子:
tb1 <- tribble(~x1,~x2,~x3,~y1,~y2,~y3,
1,NA,NA,NA,1,NA,
NA,1,NA,NA,NA,1,
NA,NA,1,NA,NA,1)
# A tibble: 3 x 6
# x1 x2 x3 y1 y2 y3
# <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
#1 1 NA NA NA 1 NA
#2 NA 1 NA NA NA 1
#3 NA NA 1 NA NA 1
with x1-y3 having the following characteristics: x1-y3 具有以下特性:
1 x1 Green
2 x2 Yellow
3 x3 Orange
4 y1 Yes
5 y2 No
6 y3 Maybe
I tried this:我试过这个:
tb1 %>%
rename("Green" =x1,
"Yellow"=x2,
"Orange"=x3,
"Yes"=y1,
"No"=y2,
"Maybe"=y3) %>%
gather(X,val,-Green,-Yellow,-Orange) %>%
gather(Y,val,-X) %>%
select(-val)
I did get an output that I wanted for these variables, but I can't imagine how to do this for 700+ variables?!我确实得到了我想要的这些变量的输出,但我无法想象如何为 700 多个变量做到这一点?! Is there a more effective way?
有没有更有效的方法?
tb1 %>%
rename("Green" =x1,
"Yellow"=x2,
"Orange"=x3,
"Yes"=y1,
"No"=y2,
"Maybe"=y3) %>%
gather(X,val,-Green,-Yellow,-Orange) %>%
filter(!is.na(val)) %>%
select(-val) %>%
gather(Y,val,-X) %>%
filter(!is.na(val)) %>%
select(-val)
# A tibble: 3 x 2
X Y
<chr> <chr>
1 No Green
2 Maybe Yellow
3 Maybe Orange
I think I might be just not acquainted enough with gather() so this is probably a stupid question - would appreciate the help.我想我可能只是对 gather() 不够熟悉,所以这可能是一个愚蠢的问题 - 希望得到帮助。 Thanks!
谢谢!
I'm assuming the issue here is with manually specify all the different variable names.我假设这里的问题是手动指定所有不同的变量名称。 Luckily,
tidyverse
has the ?select_helpers
which make it easier to select columns based on different rules.幸运的是,
tidyverse
有?select_helpers
可以更轻松地根据不同的规则选择列。
Instead of renaming the variables at the beginning, we can rename them at the end.我们可以在末尾重命名变量,而不是在开始时重命名变量。 This lets us use
starts_with
to get all columns starting with x
or y
and gather them together in one step.这让我们可以使用
starts_with
来获取所有以x
或y
开头的列,并一步将它们收集在一起。 Then we can use ends_with
to select the value columns from those gather steps and filter and drop them.然后我们可以使用
ends_with
从这些收集步骤中选择值列并过滤和删除它们。
Finally, we replace all values of x1
, y1
etc. with their true values in one step using mutate_all
and a lookup table最后,我们在一步中使用
mutate_all
和查找表将x1
、 y1
等的所有值替换为其真实值
# Make lookup table to match X and Y variables with Values
# the initial values should be the `names` (first) and the values to change them to
# should be the `values` (after the =)
lookup <- c('x1' = 'Green',
'x2' = 'Yellow',
'x3' = 'Orange',
'y1' = 'Yes',
'y2' = 'No',
'y3' = 'Maybe')
tb1 %>%
gather(X, Xval, starts_with('x')) %>% # Gather all variables that start with ‘x'
gather(Y, Yval, starts_with('y')) %>% # Gather all variables that start with ‘y'
filter_at(vars(ends_with('val')), # Looking in columns ending with ‘val'
all_vars(!is.na(.))) %>% %>% # Drop rows if ANY of these cols are NA
select(-ends_with('val')) %>% # Drop columns ending in ‘val'
mutate_all(~lookup[.]) # Replace value from lookup table in all cols
# A tibble: 3 x 2
X Y
<chr> <chr>
1 Green No
2 Yellow Maybe
3 Orange Maybe
One tricky thing with select_helpers is knowing when you an use them alone and when you need to “register” them with vars
. select_helpers 的一件棘手的事情是知道何时单独使用它们以及何时需要使用
vars
来“注册”它们。 In gather
and select
, you can use them as is.在
gather
和select
,您可以按原样使用它们。 In mutate
, filter
, summarize
, etc. you need to surround them with vars
在
mutate
、 filter
、 summarize
等中,您需要用vars
将它们包围起来
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.