如何在 dplyr (R) 中同时收集（）多列？

Question

I am trying to gather untidy data from wide to long format.我正在尝试从宽格式到长格式收集杂乱的数据。 I have 748 variables, that need to be condensed to approximately 30.我有 748 个变量，需要压缩到大约 30 个。

In this post , I asked: how to tidy my wide data?在这篇文章中，我问：如何整理我的宽数据？ The answer: use gather().答案：使用gather()。

However, I am still struggling to gather multiple columns and was hoping you could pinpoint where I'm going wrong.但是，我仍在努力收集多个列，并希望您能指出我哪里出错了。

Reproducible example:可重现的例子：

tb1 <- tribble(~x1,~x2,~x3,~y1,~y2,~y3,
       1,NA,NA,NA,1,NA,
       NA,1,NA,NA,NA,1,
       NA,NA,1,NA,NA,1)

# A tibble: 3 x 6
#     x1    x2    x3 y1       y2    y3
#  <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
#1     1    NA    NA NA        1    NA
#2    NA     1    NA NA       NA     1
#3    NA    NA     1 NA       NA     1

with x1-y3 having the following characteristics: x1-y3 具有以下特性：

1 x1    Green 
2 x2    Yellow
3 x3    Orange
4 y1    Yes   
5 y2    No    
6 y3    Maybe

I tried this:我试过这个：

tb1 %>%
  rename("Green" =x1,
         "Yellow"=x2,
         "Orange"=x3,
         "Yes"=y1,
         "No"=y2,
         "Maybe"=y3) %>%
  gather(X,val,-Green,-Yellow,-Orange) %>%
  gather(Y,val,-X) %>%
  select(-val)

I did get an output that I wanted for these variables, but I can't imagine how to do this for 700+ variables?!我确实得到了我想要的这些变量的输出，但我无法想象如何为 700 多个变量做到这一点？！ Is there a more effective way?有没有更有效的方法？

tb1 %>%
  rename("Green" =x1,
         "Yellow"=x2,
         "Orange"=x3,
         "Yes"=y1,
         "No"=y2,
         "Maybe"=y3) %>%
  gather(X,val,-Green,-Yellow,-Orange) %>%
  filter(!is.na(val)) %>%
  select(-val) %>%
  gather(Y,val,-X) %>%
  filter(!is.na(val)) %>%
  select(-val)

# A tibble: 3 x 2
  X     Y     
  <chr> <chr> 
1 No    Green 
2 Maybe Yellow
3 Maybe Orange

I think I might be just not acquainted enough with gather() so this is probably a stupid question - would appreciate the help.我想我可能只是对 gather() 不够熟悉，所以这可能是一个愚蠢的问题 - 希望得到帮助。 Thanks!谢谢！

Answer 1

I'm assuming the issue here is with manually specify all the different variable names.我假设这里的问题是手动指定所有不同的变量名称。 Luckily, tidyverse has the ?select_helpers which make it easier to select columns based on different rules.幸运的是， tidyverse有?select_helpers可以更轻松地根据不同的规则选择列。

Instead of renaming the variables at the beginning, we can rename them at the end.我们可以在末尾重命名变量，而不是在开始时重命名变量。 This lets us use starts_with to get all columns starting with x or y and gather them together in one step.这让我们可以使用starts_with来获取所有以x或y开头的列，并一步将它们收集在一起。 Then we can use ends_with to select the value columns from those gather steps and filter and drop them.然后我们可以使用ends_with从这些收集步骤中选择值列并过滤和删除它们。

Finally, we replace all values of x1 , y1 etc. with their true values in one step using mutate_all and a lookup table最后，我们在一步中使用mutate_all和查找表将x1 、 y1等的所有值替换为其真实值

# Make lookup table to match X and Y variables with Values
  # the initial values should be the `names` (first) and the values to change them to
  # should be the `values` (after the =)
lookup <- c('x1' = 'Green',
            'x2' = 'Yellow',
            'x3' = 'Orange',
            'y1' = 'Yes',
            'y2' = 'No',
            'y3' = 'Maybe')

tb1 %>%
    gather(X, Xval, starts_with('x')) %>%    # Gather all variables that start with ‘x'
    gather(Y, Yval, starts_with('y')) %>%    # Gather all variables that start with ‘y'
    filter_at(vars(ends_with('val')),        # Looking in columns ending with ‘val'
              all_vars(!is.na(.))) %>% %>%    # Drop rows if ANY of these cols are NA
    select(-ends_with('val')) %>%            # Drop columns ending in ‘val'
    mutate_all(~lookup[.])                   # Replace value from lookup table in all cols

# A tibble: 3 x 2
  X      Y 
  <chr>  <chr>
1 Green  No   
2 Yellow Maybe
3 Orange Maybe

One tricky thing with select_helpers is knowing when you an use them alone and when you need to “register” them with vars . select_helpers 的一件棘手的事情是知道何时单独使用它们以及何时需要使用vars来“注册”它们。 In gather and select , you can use them as is.在gather和select ，您可以按原样使用它们。 In mutate , filter , summarize , etc. you need to surround them with vars在mutate 、 filter 、 summarize等中，您需要用vars将它们包围起来

如何在 dplyr (R) 中同时收集（）多列？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-05-07 15:21:53

如何在 dplyr (R) 中同时收集（）多列？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-05-07 15:21:53

解决方案1
1 已采纳 2019-05-07 15:21:53