简体   繁体   English

在R中同时合并几对列(包含数字和NA)

[英]Combining several pairs of columns (containing numbers and NA's) simultaneously in R

I'm trying to determine how to combine columns efficiently. 我正在尝试确定如何有效地组合列。 I've started with a dataframe that looks somewhat like the following. 我从一个看起来像下面的数据帧开始。 The variable names do not follow any specific pattern, and the columns I am trying to combine are not necessarily next to each other. 变量名称不遵循任何特定的模式,我尝试合并的列不一定彼此相邻。 I've included the column numbers to make it easier to refer to them. 我包括了列号,以使其更易于引用。

Imagine I'm trying to combine columns 2 and 3, columns 4 and 7, and columns 5 and 6. As you can see, if there is a number in one of the columns that's being combined, then the corresponding column has an NA. 想象一下,我正在尝试合并第2列和第3列,第4列和第7列以及第5列和第6列。如您所见,如果要合并的一列中有数字,则对应的列具有NA。 If column 8 == a, column 2 is a number and column 3 is NA. 如果第8列== a,则第2列为数字,第3列为NA。 If column 8 == b, column 2 is NA and column 3 is a number. 如果第8列== b,则第2列为NA,第3列为数字。 A similar pattern follows for columns 9 (which maps onto 4 and 7) and 10 (which maps onto 5 and 6). 对于列9(映射到4和7)和列10(映射到5和6),遵循类似的模式。

1     2      3      4     5     6     7     8     9     10

id    ab_1   ab2_1  dc_3  de_4  ze37  uh44  fac1  fac2  fac3
1     2      NA     NA    4     NA    5     a     c     e
2     NA     4      NA    NA    1     3     b     c     f
3     NA     7      2     5     NA    NA    b     d     e
4     5      NA     3     NA    7     NA    a     d     f 

I am trying to generate 3 new columns: one with combined values for 2 and 3, one with combined values for 4 and 7, and one with combined values for 5 and 6. I would like them added to the end of the dataframe above, and I do not care if the original columns being combined remain in the dataframe. 我正在尝试生成3个新列:一列具有2和3的组合值,一列具有4和7的组合值,一列具有5和6的组合值。我希望将它们添加到上述数据帧的末尾,并且我不在乎被合并的原始列是否保留在数据框中。 This is what the additional 3 columns should look like: 这是另外三列的外观:

col1  col2  col3
2     5     4
4     3     1
7     2     5
5     3     7

This is how I have been doing this so far: 到目前为止,这就是我这样做的方式:

df <- df %>%    ## combining columns 2 and 3
      gather(., 'ab_1', 'ab2_1', key = "key", value = "col1") %>%
      filter(., fac1 == "a" & key == "ab1_1" | fac1 == "b" & key == "ab2_1")

df <- df %>%    ## combining columns 4 and 7
      gather(., 'dc_3', 'uh44', key = "key2", value = "col2") %>%
      filter(., fac2 == "c" & key2 == "uh44" | 
                fac2 == "d" & key2 == "dc_3")

df <- df %>%    ## combining columns 5 and 6
      gather(., 'de_4', 'ze37', key = "key3", value = "col3") %>%
      filter(., fac3 == "e" & key == "de_4" | fac3 == "f" & key == "ze37")

Is there a way to combine these so that I don't have to manually repeat the same functions to make each additional column? 有没有一种方法可以将这些结合起来,这样我就不必手动重复相同的功能来增加每一列? There are several more columns I need to combine, so I'm hoping there is a more efficient way of doing this. 我还需要合并其他几列,因此我希望有一种更有效的方法。 Please let me know if I can clarify anything. 请让我知道是否可以澄清任何事情。

Perhaps something like this using dplyr::coalesce ? 也许使用dplyr::coalesce这样的事情?

# Define the pairs
prs <- list(col1 = c(2, 3), col2 = c(4, 7), col3 = c(5, 6))

library(tidyverse)
imap_dfc(prs, ~df[, .x] %>% transmute(!!.y := coalesce(!!!syms(names(df)[.x]))))
#  col1 col2 col3
#1    2    5    4
#2    4    3    1
#3    7    2    5
#4    5    3    7

Sample data 样本数据

df <- read.table(text =
    "id    ab_1   ab2_1  dc_3  de_4  ze37  uh44  fac1  fac2  fac3
1     2      NA     NA    4     NA    5     a     c     e
2     NA     4      NA    NA    1     3     b     c     f
3     NA     7      2     5     NA    NA    b     d     e
4     5      NA     3     NA    7     NA    a     d     f ", header = T)

This is much more verbose than Maurits' solution, but it gets to the same place: 这比Maurits的解决方案冗长得多,但是它到了相同的地方:

library(tidyverse)
col_grps <- tibble(col = colnames(df),
                   group = c(NA, 1, 1, 2, 3, 3, 2, NA, NA, NA))

output <- df %>%
  gather(col, value, -id) %>%
  left_join(col_grps) %>%
  mutate(value = value %>% as.numeric) %>%
  group_by(id, group) %>%
  summarise(sums = sum(value, na.rm = TRUE)) %>% ungroup() %>%
  spread(group, sums) %>%
  select(-id, -`<NA>`)

output
# A tibble: 4 x 3
    `1`   `2`   `3`
  <dbl> <dbl> <dbl>
1     2     5     4
2     4     3     1
3     7     2     5
4     5     3     7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM