简体   繁体   English

返回总和 >=0 的两种列类型的所有组合,并返回 R [R] 中哪些列的相应摘要元数据

[英]Return all combinations of two column types that sum to >=0 and return corresponding summary metadata for which columns in R [R]

I have data like this:我有这样的数据:

example_df <- data.frame(
  col1type1 =c(110:106),
  col2type2 = c(-108:-104),
  col3type1 = c(-109:-105), 
  col4type2 =c(110:106),
  col5type1 =c(107:103),
  col6type2 = c(-110:-106),
  col7type1 =c(109:113),
  col8type2 = c(-120:-116),
  col9type1 = c(-105:-101),
  col10type2 =c(105:101),
  col11type1 = c(-125:-121),
  col12type2 = c(-105:-101) 
)

I want to return only combinations where type1+type2>=0 on the same row and return to a new df the combination where it was >=0, the row, and the two numbers: (I know I could use for/foreach to calculate each cell individually and output to a data.frame, but there has to be a more efficient way)我只想返回 type1+type2>=0 在同一行上的组合,并返回一个新的 df 组合>=0、行和两个数字:(我知道我可以使用 for/foreach单独计算每个单元格并输出到data.frame,但必须有更有效的方法)

Desired output like this (incomplete):像这样的期望输出(不完整):

#for all possible combinations, like the example rows below
example_first <- data.frame(column_combination="col1type1_col2type2", row=1, sum=2,col1number=110,col2number=-108)
example_mid<- data.frame(column_combination="col1type1_col12type2",row=3, sum=5,col1number=108,col2number=-103)
example_last <- data.frame(column_combination="col9type1_col10type2",row=5,sum=0,col1number=-101,col2number=101)

#would want like this for all possible combinations
desired_incomplete_output <- rbind(example_first,example_mid,example_last) 

What is an efficient way to calculate this en masse rather than a brutal for/foreach loop?什么是集体计算而不是残酷的 for/foreach 循环的有效方法? Thanks!谢谢!

If the desired complete output consists of 79 results, for the given example, you may do something like this.如果所需的完整输出包含 79 个结果,对于给定的示例,您可以执行类似的操作。

Explanation of steps-步骤说明-

  1. Through first 2 lines ie mutate and split we have split the data into separate row each of its own dataframe, ie into a list.通过前 2 行,即mutatesplit ,我们将数据拆分为单独的行,每个行都有自己的数据框,即列表。
  2. To work with this list I have used purrr::imap_dfr which basically takes a list as input and outputs a data.frame after row binding all results.为了处理这个列表,我使用purrr::imap_dfr ,它基本上将一个列表作为输入,并在行绑定所有结果后输出一个data.frame In each of its sub-step, I have done-在它的每个子步骤中,我已经完成了-
    • First deselecting row column首先取消选择行列
    • the pivoting everything转动的一切
    • then separating name column which has all your column names of input data into two separate columns using tidyr::separate然后使用tidyr::separate包含输入数据的所有列名的name列分隔为两个单独的列
    • then creating a cross product of num1 and num2 combinations using purrr::cross2然后使用purrr::cross2创建num1num2组合的叉积
    • then using map_dfr again to convert that cross product into a data frame然后再次使用 map_dfr 将该叉积转换为数据框
    • then using separator to separate column names and values.然后使用分隔符分隔列名和值。 I used a seapartor @ which I assumed that it is nowhere used in column names我使用了一个 seapartor @我认为它在列名中没有使用
    • thereafter filtering the rows之后过滤行
    • other basic data wrangling/transformation using dplyr verbs使用dplyr动词的其他基本数据整理/转换
library(tidyverse)

example_df %>% 
  mutate(row = row_number()) %>% 
  split(.$row) %>% 
  imap_dfr(\(.a, .b) .a %>% 
        select(-row) %>% 
        pivot_longer(everything()) %>% 
        separate(name, into = c('col', 'type'), sep = '(?:type)') %>% 
        {cross2(paste(.$col[.$type == '1'], .$value[.$type == '1'], sep = "@"), 
                paste(.$col[.$type == '2'], .$value[.$type == '2'], sep = "@"))} %>% 
        map_dfr(~ set_names(.x, c('x', 'y'))) %>% 
        separate(x, into = c('col1', 'type1'), convert = TRUE, sep = '@') %>% 
        separate(y, into = c('col2', 'type2'), convert = TRUE, sep = "@") %>% 
        filter(type1 + type2 >= 0) %>% 
        mutate(col_comb = paste0(col1, 'type1_', col2, "type2"),
               sum= type1 + type2) %>% 
        rename(col1number = type1,
               col2number = type2) %>% 
        select(-col1, -col2) %>% 
        mutate(row = .b))
#> # A tibble: 79 × 5
#>    col1number col2number col_comb               sum row  
#>         <int>      <int> <chr>                <int> <chr>
#>  1        110       -108 col1type1_col2type2      2 1    
#>  2        109       -108 col7type1_col2type2      1 1    
#>  3        110        110 col1type1_col4type2    220 1    
#>  4       -109        110 col3type1_col4type2      1 1    
#>  5        107        110 col5type1_col4type2    217 1    
#>  6        109        110 col7type1_col4type2    219 1    
#>  7       -105        110 col9type1_col4type2      5 1    
#>  8        110       -110 col1type1_col6type2      0 1    
#>  9        110        105 col1type1_col10type2   215 1    
#> 10        107        105 col5type1_col10type2   212 1    
#> # … with 69 more rows

In case your columns are named as anum1 , anum2 , bnum1 ..., we can modify this a bit (3 steps actually, all marked as comments)如果您的列被命名为anum1anum2bnum1 ...,我们可以稍微修改一下(实际上是 3 个步骤,都标记为注释)

example_df %>% 
  mutate(row = row_number()) %>% 
  split(.$row) %>% 
  imap_dfr(\(.a, .b) .a %>% 
             select(-row) %>% 
             pivot_longer(everything()) %>% 
             separate(name, into = c('col', 'type'), sep = '(?:num)') %>% # change sep
             {cross2(paste(.$col[.$type == '1'], .$value[.$type == '1'], sep = "@"), 
                     paste(.$col[.$type == '2'], .$value[.$type == '2'], sep = "@"))} %>% 
             map_dfr(~ set_names(.x, c('x', 'y'))) %>% 
             separate(x, into = c('col1', 'type1'), convert = TRUE, sep = '@') %>% 
             separate(y, into = c('col2', 'type2'), convert = TRUE, sep = "@") %>% 
             filter(type1 + type2 >= 0) %>% 
             mutate(col_comb = paste0(col1, 'type1_', col2, "type2"),
                    sum= type1 + type2) %>% 
             rename(col1number = num1,      # change prefix
                    col2number = num2) %>%  # change prefix
             select(-col1, -col2) %>% 
             mutate(row = .b))

You can pivot_longer, excepting each column in iteration.您可以 pivot_longer,但迭代中的每一列除外。 Then mutate to create sum, column names, rows, and filter for non-negatives然后变异以创建总和、列名、行并过滤非负数

library(dplyr)
library(tidyr)
library(purrr)
  map_dfr(1:length(example_df), function(i){
    example_df %>% 
      # Get row number
      tibble::rownames_to_column(., var = "row") %>% 
      # Excepting the rowname column and iterating through each column (except row)
      pivot_longer(-c(row, (i + 1)), names_to = "col2_name", values_to = "col2number") %>% 
      rename(col1number = 2) %>% 
      rowwise() %>% 
      # Get the column names and paste together for combination
      mutate(col1_name = colnames(example_df)[i],
             column_combination = paste(col1_name, col2_name, sep = "_"),
             # These are the value columns
             sum = sum(across(c(2, 4)))) %>% 
      filter(sum >= 0 & (stringr::str_sub(col1_name, -1) != stringr::str_sub(col2_name, -1))) %>% 
      select(column_combination, row, sum, col1number, col2number)
  }) %>% 
    bind_rows %>%
    ungroup() %>% 
    arrange(desc(sum))

Gives:给出:

# A tibble: 158 x 5
   column_combination  row     sum col1number col2number
   <chr>               <chr> <int>      <int>      <int>
 1 col1type1_col4type2 1       220        110        110
 2 col4type2_col1type1 1       220        110        110
 3 col4type2_col7type1 1       219        110        109
 4 col4type2_col7type1 2       219        109        110
 5 col4type2_col7type1 3       219        108        111
 6 col4type2_col7type1 4       219        107        112
 7 col4type2_col7type1 5       219        106        113
 8 col7type1_col4type2 1       219        109        110
 9 col7type1_col4type2 2       219        110        109
10 col7type1_col4type2 3       219        111        108
# ... with 148 more rows

A matrix approach:矩阵方法:

m <- as.matrix(example_df)

type1 <- seq(1,ncol(m),by=2)
type2 <- seq(2,ncol(m),by=2)

cbn <- expand.grid(type1=type1,type2=type2)

res.selection <- as.vector((m[,cbn$type1]+m[,cbn$type2])>0)

res.row <- rep(1:nrow(m),nrow(cbn))
res.type1number <- as.vector(m[,cbn$type1])
res.type2number <- as.vector(m[,cbn$type2])
res.sum <- as.vector(m[,cbn$type1]+m[,cbn$type2])
res.type1 <- rep(cbn$type1,each=nrow(m))
res.type2 <- rep(cbn$type2,each=nrow(m))    

data.frame( combination = paste0('col',res.type1[res.selection],'type1-col',res.type2[res.selection],'type2'), 
            row = res.row[res.selection],
            type1number = res.type1number[res.selection], 
            type2number = res.type2number[res.selection],
            sum = res.sum[res.selection])

#>             combination row type1number type2number sum
#> 1   col1type1-col2type2   1         110        -108   2
#> 2   col1type1-col2type2   2         109        -107   2
#> 3   col1type1-col2type2   3         108        -106   2
#> 4   col1type1-col2type2   4         107        -105   2
#> 5   col1type1-col2type2   5         106        -104   2
#> 6   col7type1-col2type2   1         109        -108   1
#> 7   col7type1-col2type2   2         110        -107   3
#> 8   col7type1-col2type2   3         111        -106   5
#> 9   col7type1-col2type2   4         112        -105   7
#> 10  col7type1-col2type2   5         113        -104   9
#> 11  col1type1-col4type2   1         110         110 220
#> 12  col1type1-col4type2   2         109         109 218
#> 13  col1type1-col4type2   3         108         108 216
#> 14  col1type1-col4type2   4         107         107 214
#> 15  col1type1-col4type2   5         106         106 212
#> 16  col3type1-col4type2   1        -109         110   1
#> 17  col3type1-col4type2   2        -108         109   1
#> 18  col3type1-col4type2   3        -107         108   1
#> 19  col3type1-col4type2   4        -106         107   1
#> 20  col3type1-col4type2   5        -105         106   1
#> 21  col5type1-col4type2   1         107         110 217
#> 22  col5type1-col4type2   2         106         109 215
#> 23  col5type1-col4type2   3         105         108 213
#> 24  col5type1-col4type2   4         104         107 211
#> 25  col5type1-col4type2   5         103         106 209
#> 26  col7type1-col4type2   1         109         110 219
#> 27  col7type1-col4type2   2         110         109 219
#> 28  col7type1-col4type2   3         111         108 219
#> 29  col7type1-col4type2   4         112         107 219
#> 30  col7type1-col4type2   5         113         106 219
#> 31  col9type1-col4type2   1        -105         110   5
#> 32  col9type1-col4type2   2        -104         109   5
#> 33  col9type1-col4type2   3        -103         108   5
#> 34  col9type1-col4type2   4        -102         107   5
#> 35  col9type1-col4type2   5        -101         106   5
#> 36  col7type1-col6type2   2         110        -109   1
#> 37  col7type1-col6type2   3         111        -108   3
#> 38  col7type1-col6type2   4         112        -107   5
#> 39  col7type1-col6type2   5         113        -106   7
#> 40 col1type1-col10type2   1         110         105 215
#> 41 col1type1-col10type2   2         109         104 213
#> 42 col1type1-col10type2   3         108         103 211
#> 43 col1type1-col10type2   4         107         102 209
#> 44 col1type1-col10type2   5         106         101 207
#> 45 col5type1-col10type2   1         107         105 212
#> 46 col5type1-col10type2   2         106         104 210
#> 47 col5type1-col10type2   3         105         103 208
#> 48 col5type1-col10type2   4         104         102 206
#> 49 col5type1-col10type2   5         103         101 204
#> 50 col7type1-col10type2   1         109         105 214
#> 51 col7type1-col10type2   2         110         104 214
#> 52 col7type1-col10type2   3         111         103 214
#> 53 col7type1-col10type2   4         112         102 214
#> 54 col7type1-col10type2   5         113         101 214
#> 55 col1type1-col12type2   1         110        -105   5
#> 56 col1type1-col12type2   2         109        -104   5
#> 57 col1type1-col12type2   3         108        -103   5
#> 58 col1type1-col12type2   4         107        -102   5
#> 59 col1type1-col12type2   5         106        -101   5
#> 60 col5type1-col12type2   1         107        -105   2
#> 61 col5type1-col12type2   2         106        -104   2
#> 62 col5type1-col12type2   3         105        -103   2
#> 63 col5type1-col12type2   4         104        -102   2
#> 64 col5type1-col12type2   5         103        -101   2
#> 65 col7type1-col12type2   1         109        -105   4
#> 66 col7type1-col12type2   2         110        -104   6
#> 67 col7type1-col12type2   3         111        -103   8
#> 68 col7type1-col12type2   4         112        -102  10
#> 69 col7type1-col12type2   5         113        -101  12

Here is a seemingly much simpler approach: Just split the original into type 1 and type 1 frames, and use a double lapply() approach, rowbinding the results这是一个看似简单得多的方法:只需将原始帧拆分为类型 1 和类型 1 帧,并使用双lapply()方法,对结果进行行绑定

  1. Split分裂
t1 = example_df[,grepl("type1", names(example_df))]
t2 = example_df[,grepl("type2", names(example_df))]
  1. Estimate:估计:
library(data.table)

rbindlist(lapply(t1, \(i1) {
  rbindlist(lapply(t2, \(i2) data.table(t1val = i1, t2val=i2, sum=i1+i2)[,row:=.I][sum>=0]),idcol = "t2_col")
}), idcol ="t1_col")

Output输出

       t1_col     t2_col t1val t2val sum row
 1: col1type1  col2type2   110  -108   2   1
 2: col1type1  col2type2   109  -107   2   2
 3: col1type1  col2type2   108  -106   2   3
 4: col1type1  col2type2   107  -105   2   4
 5: col1type1  col2type2   106  -104   2   5
 6: col1type1  col4type2   110   110 220   1
 7: col1type1  col4type2   109   109 218   2
 8: col1type1  col4type2   108   108 216   3
 9: col1type1  col4type2   107   107 214   4
10: col1type1  col4type2   106   106 212   5
11: col1type1  col6type2   110  -110   0   1
12: col1type1  col6type2   109  -109   0   2
13: col1type1  col6type2   108  -108   0   3
14: col1type1  col6type2   107  -107   0   4
15: col1type1  col6type2   106  -106   0   5
16: col1type1 col10type2   110   105 215   1
17: col1type1 col10type2   109   104 213   2
18: col1type1 col10type2   108   103 211   3
19: col1type1 col10type2   107   102 209   4
20: col1type1 col10type2   106   101 207   5
21: col1type1 col12type2   110  -105   5   1
22: col1type1 col12type2   109  -104   5   2
23: col1type1 col12type2   108  -103   5   3
24: col1type1 col12type2   107  -102   5   4
25: col1type1 col12type2   106  -101   5   5
26: col3type1  col4type2  -109   110   1   1
27: col3type1  col4type2  -108   109   1   2
28: col3type1  col4type2  -107   108   1   3
29: col3type1  col4type2  -106   107   1   4
30: col3type1  col4type2  -105   106   1   5
31: col5type1  col4type2   107   110 217   1
32: col5type1  col4type2   106   109 215   2
33: col5type1  col4type2   105   108 213   3
34: col5type1  col4type2   104   107 211   4
35: col5type1  col4type2   103   106 209   5
36: col5type1 col10type2   107   105 212   1
37: col5type1 col10type2   106   104 210   2
38: col5type1 col10type2   105   103 208   3
39: col5type1 col10type2   104   102 206   4
40: col5type1 col10type2   103   101 204   5
41: col5type1 col12type2   107  -105   2   1
42: col5type1 col12type2   106  -104   2   2
43: col5type1 col12type2   105  -103   2   3
44: col5type1 col12type2   104  -102   2   4
45: col5type1 col12type2   103  -101   2   5
46: col7type1  col2type2   109  -108   1   1
47: col7type1  col2type2   110  -107   3   2
48: col7type1  col2type2   111  -106   5   3
49: col7type1  col2type2   112  -105   7   4
50: col7type1  col2type2   113  -104   9   5
51: col7type1  col4type2   109   110 219   1
52: col7type1  col4type2   110   109 219   2
53: col7type1  col4type2   111   108 219   3
54: col7type1  col4type2   112   107 219   4
55: col7type1  col4type2   113   106 219   5
56: col7type1  col6type2   110  -109   1   2
57: col7type1  col6type2   111  -108   3   3
58: col7type1  col6type2   112  -107   5   4
59: col7type1  col6type2   113  -106   7   5
60: col7type1 col10type2   109   105 214   1
61: col7type1 col10type2   110   104 214   2
62: col7type1 col10type2   111   103 214   3
63: col7type1 col10type2   112   102 214   4
64: col7type1 col10type2   113   101 214   5
65: col7type1 col12type2   109  -105   4   1
66: col7type1 col12type2   110  -104   6   2
67: col7type1 col12type2   111  -103   8   3
68: col7type1 col12type2   112  -102  10   4
69: col7type1 col12type2   113  -101  12   5
70: col9type1  col4type2  -105   110   5   1
71: col9type1  col4type2  -104   109   5   2
72: col9type1  col4type2  -103   108   5   3
73: col9type1  col4type2  -102   107   5   4
74: col9type1  col4type2  -101   106   5   5
75: col9type1 col10type2  -105   105   0   1
76: col9type1 col10type2  -104   104   0   2
77: col9type1 col10type2  -103   103   0   3
78: col9type1 col10type2  -102   102   0   4
79: col9type1 col10type2  -101   101   0   5
       t1_col     t2_col t1val t2val sum row

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM