[英]Return all combinations of two column types that sum to >=0 and return corresponding summary metadata for which columns in R [R]
I have data like this:我有这样的数据:
example_df <- data.frame(
col1type1 =c(110:106),
col2type2 = c(-108:-104),
col3type1 = c(-109:-105),
col4type2 =c(110:106),
col5type1 =c(107:103),
col6type2 = c(-110:-106),
col7type1 =c(109:113),
col8type2 = c(-120:-116),
col9type1 = c(-105:-101),
col10type2 =c(105:101),
col11type1 = c(-125:-121),
col12type2 = c(-105:-101)
)
I want to return only combinations where type1+type2>=0 on the same row and return to a new df the combination where it was >=0, the row, and the two numbers: (I know I could use for/foreach to calculate each cell individually and output to a data.frame, but there has to be a more efficient way)我只想返回 type1+type2>=0 在同一行上的组合,并返回一个新的 df 组合>=0、行和两个数字:(我知道我可以使用 for/foreach单独计算每个单元格并输出到data.frame,但必须有更有效的方法)
Desired output like this (incomplete):像这样的期望输出(不完整):
#for all possible combinations, like the example rows below
example_first <- data.frame(column_combination="col1type1_col2type2", row=1, sum=2,col1number=110,col2number=-108)
example_mid<- data.frame(column_combination="col1type1_col12type2",row=3, sum=5,col1number=108,col2number=-103)
example_last <- data.frame(column_combination="col9type1_col10type2",row=5,sum=0,col1number=-101,col2number=101)
#would want like this for all possible combinations
desired_incomplete_output <- rbind(example_first,example_mid,example_last)
What is an efficient way to calculate this en masse rather than a brutal for/foreach loop?什么是集体计算而不是残酷的 for/foreach 循环的有效方法? Thanks!
谢谢!
If the desired complete output consists of 79 results, for the given example, you may do something like this.如果所需的完整输出包含 79 个结果,对于给定的示例,您可以执行类似的操作。
Explanation of steps-步骤说明-
mutate
and split
we have split the data into separate row each of its own dataframe, ie into a list.mutate
和split
,我们将数据拆分为单独的行,每个行都有自己的数据框,即列表。purrr::imap_dfr
which basically takes a list as input and outputs a data.frame
after row binding all results.purrr::imap_dfr
,它基本上将一个列表作为输入,并在行绑定所有结果后输出一个data.frame
。 In each of its sub-step, I have done-name
column which has all your column names of input data into two separate columns using tidyr::separate
tidyr::separate
包含输入数据的所有列名的name
列分隔为两个单独的列num1
and num2
combinations using purrr::cross2
purrr::cross2
创建num1
和num2
组合的叉积@
which I assumed that it is nowhere used in column names@
我认为它在列名中没有使用dplyr
verbsdplyr
动词的其他基本数据整理/转换library(tidyverse)
example_df %>%
mutate(row = row_number()) %>%
split(.$row) %>%
imap_dfr(\(.a, .b) .a %>%
select(-row) %>%
pivot_longer(everything()) %>%
separate(name, into = c('col', 'type'), sep = '(?:type)') %>%
{cross2(paste(.$col[.$type == '1'], .$value[.$type == '1'], sep = "@"),
paste(.$col[.$type == '2'], .$value[.$type == '2'], sep = "@"))} %>%
map_dfr(~ set_names(.x, c('x', 'y'))) %>%
separate(x, into = c('col1', 'type1'), convert = TRUE, sep = '@') %>%
separate(y, into = c('col2', 'type2'), convert = TRUE, sep = "@") %>%
filter(type1 + type2 >= 0) %>%
mutate(col_comb = paste0(col1, 'type1_', col2, "type2"),
sum= type1 + type2) %>%
rename(col1number = type1,
col2number = type2) %>%
select(-col1, -col2) %>%
mutate(row = .b))
#> # A tibble: 79 × 5
#> col1number col2number col_comb sum row
#> <int> <int> <chr> <int> <chr>
#> 1 110 -108 col1type1_col2type2 2 1
#> 2 109 -108 col7type1_col2type2 1 1
#> 3 110 110 col1type1_col4type2 220 1
#> 4 -109 110 col3type1_col4type2 1 1
#> 5 107 110 col5type1_col4type2 217 1
#> 6 109 110 col7type1_col4type2 219 1
#> 7 -105 110 col9type1_col4type2 5 1
#> 8 110 -110 col1type1_col6type2 0 1
#> 9 110 105 col1type1_col10type2 215 1
#> 10 107 105 col5type1_col10type2 212 1
#> # … with 69 more rows
In case your columns are named as anum1
, anum2
, bnum1
..., we can modify this a bit (3 steps actually, all marked as comments)如果您的列被命名为
anum1
、 anum2
、 bnum1
...,我们可以稍微修改一下(实际上是 3 个步骤,都标记为注释)
example_df %>%
mutate(row = row_number()) %>%
split(.$row) %>%
imap_dfr(\(.a, .b) .a %>%
select(-row) %>%
pivot_longer(everything()) %>%
separate(name, into = c('col', 'type'), sep = '(?:num)') %>% # change sep
{cross2(paste(.$col[.$type == '1'], .$value[.$type == '1'], sep = "@"),
paste(.$col[.$type == '2'], .$value[.$type == '2'], sep = "@"))} %>%
map_dfr(~ set_names(.x, c('x', 'y'))) %>%
separate(x, into = c('col1', 'type1'), convert = TRUE, sep = '@') %>%
separate(y, into = c('col2', 'type2'), convert = TRUE, sep = "@") %>%
filter(type1 + type2 >= 0) %>%
mutate(col_comb = paste0(col1, 'type1_', col2, "type2"),
sum= type1 + type2) %>%
rename(col1number = num1, # change prefix
col2number = num2) %>% # change prefix
select(-col1, -col2) %>%
mutate(row = .b))
You can pivot_longer, excepting each column in iteration.您可以 pivot_longer,但迭代中的每一列除外。 Then mutate to create sum, column names, rows, and filter for non-negatives
然后变异以创建总和、列名、行并过滤非负数
library(dplyr)
library(tidyr)
library(purrr)
map_dfr(1:length(example_df), function(i){
example_df %>%
# Get row number
tibble::rownames_to_column(., var = "row") %>%
# Excepting the rowname column and iterating through each column (except row)
pivot_longer(-c(row, (i + 1)), names_to = "col2_name", values_to = "col2number") %>%
rename(col1number = 2) %>%
rowwise() %>%
# Get the column names and paste together for combination
mutate(col1_name = colnames(example_df)[i],
column_combination = paste(col1_name, col2_name, sep = "_"),
# These are the value columns
sum = sum(across(c(2, 4)))) %>%
filter(sum >= 0 & (stringr::str_sub(col1_name, -1) != stringr::str_sub(col2_name, -1))) %>%
select(column_combination, row, sum, col1number, col2number)
}) %>%
bind_rows %>%
ungroup() %>%
arrange(desc(sum))
Gives:给出:
# A tibble: 158 x 5
column_combination row sum col1number col2number
<chr> <chr> <int> <int> <int>
1 col1type1_col4type2 1 220 110 110
2 col4type2_col1type1 1 220 110 110
3 col4type2_col7type1 1 219 110 109
4 col4type2_col7type1 2 219 109 110
5 col4type2_col7type1 3 219 108 111
6 col4type2_col7type1 4 219 107 112
7 col4type2_col7type1 5 219 106 113
8 col7type1_col4type2 1 219 109 110
9 col7type1_col4type2 2 219 110 109
10 col7type1_col4type2 3 219 111 108
# ... with 148 more rows
A matrix approach:矩阵方法:
m <- as.matrix(example_df)
type1 <- seq(1,ncol(m),by=2)
type2 <- seq(2,ncol(m),by=2)
cbn <- expand.grid(type1=type1,type2=type2)
res.selection <- as.vector((m[,cbn$type1]+m[,cbn$type2])>0)
res.row <- rep(1:nrow(m),nrow(cbn))
res.type1number <- as.vector(m[,cbn$type1])
res.type2number <- as.vector(m[,cbn$type2])
res.sum <- as.vector(m[,cbn$type1]+m[,cbn$type2])
res.type1 <- rep(cbn$type1,each=nrow(m))
res.type2 <- rep(cbn$type2,each=nrow(m))
data.frame( combination = paste0('col',res.type1[res.selection],'type1-col',res.type2[res.selection],'type2'),
row = res.row[res.selection],
type1number = res.type1number[res.selection],
type2number = res.type2number[res.selection],
sum = res.sum[res.selection])
#> combination row type1number type2number sum
#> 1 col1type1-col2type2 1 110 -108 2
#> 2 col1type1-col2type2 2 109 -107 2
#> 3 col1type1-col2type2 3 108 -106 2
#> 4 col1type1-col2type2 4 107 -105 2
#> 5 col1type1-col2type2 5 106 -104 2
#> 6 col7type1-col2type2 1 109 -108 1
#> 7 col7type1-col2type2 2 110 -107 3
#> 8 col7type1-col2type2 3 111 -106 5
#> 9 col7type1-col2type2 4 112 -105 7
#> 10 col7type1-col2type2 5 113 -104 9
#> 11 col1type1-col4type2 1 110 110 220
#> 12 col1type1-col4type2 2 109 109 218
#> 13 col1type1-col4type2 3 108 108 216
#> 14 col1type1-col4type2 4 107 107 214
#> 15 col1type1-col4type2 5 106 106 212
#> 16 col3type1-col4type2 1 -109 110 1
#> 17 col3type1-col4type2 2 -108 109 1
#> 18 col3type1-col4type2 3 -107 108 1
#> 19 col3type1-col4type2 4 -106 107 1
#> 20 col3type1-col4type2 5 -105 106 1
#> 21 col5type1-col4type2 1 107 110 217
#> 22 col5type1-col4type2 2 106 109 215
#> 23 col5type1-col4type2 3 105 108 213
#> 24 col5type1-col4type2 4 104 107 211
#> 25 col5type1-col4type2 5 103 106 209
#> 26 col7type1-col4type2 1 109 110 219
#> 27 col7type1-col4type2 2 110 109 219
#> 28 col7type1-col4type2 3 111 108 219
#> 29 col7type1-col4type2 4 112 107 219
#> 30 col7type1-col4type2 5 113 106 219
#> 31 col9type1-col4type2 1 -105 110 5
#> 32 col9type1-col4type2 2 -104 109 5
#> 33 col9type1-col4type2 3 -103 108 5
#> 34 col9type1-col4type2 4 -102 107 5
#> 35 col9type1-col4type2 5 -101 106 5
#> 36 col7type1-col6type2 2 110 -109 1
#> 37 col7type1-col6type2 3 111 -108 3
#> 38 col7type1-col6type2 4 112 -107 5
#> 39 col7type1-col6type2 5 113 -106 7
#> 40 col1type1-col10type2 1 110 105 215
#> 41 col1type1-col10type2 2 109 104 213
#> 42 col1type1-col10type2 3 108 103 211
#> 43 col1type1-col10type2 4 107 102 209
#> 44 col1type1-col10type2 5 106 101 207
#> 45 col5type1-col10type2 1 107 105 212
#> 46 col5type1-col10type2 2 106 104 210
#> 47 col5type1-col10type2 3 105 103 208
#> 48 col5type1-col10type2 4 104 102 206
#> 49 col5type1-col10type2 5 103 101 204
#> 50 col7type1-col10type2 1 109 105 214
#> 51 col7type1-col10type2 2 110 104 214
#> 52 col7type1-col10type2 3 111 103 214
#> 53 col7type1-col10type2 4 112 102 214
#> 54 col7type1-col10type2 5 113 101 214
#> 55 col1type1-col12type2 1 110 -105 5
#> 56 col1type1-col12type2 2 109 -104 5
#> 57 col1type1-col12type2 3 108 -103 5
#> 58 col1type1-col12type2 4 107 -102 5
#> 59 col1type1-col12type2 5 106 -101 5
#> 60 col5type1-col12type2 1 107 -105 2
#> 61 col5type1-col12type2 2 106 -104 2
#> 62 col5type1-col12type2 3 105 -103 2
#> 63 col5type1-col12type2 4 104 -102 2
#> 64 col5type1-col12type2 5 103 -101 2
#> 65 col7type1-col12type2 1 109 -105 4
#> 66 col7type1-col12type2 2 110 -104 6
#> 67 col7type1-col12type2 3 111 -103 8
#> 68 col7type1-col12type2 4 112 -102 10
#> 69 col7type1-col12type2 5 113 -101 12
Here is a seemingly much simpler approach: Just split the original into type 1 and type 1 frames, and use a double lapply()
approach, rowbinding the results这是一个看似简单得多的方法:只需将原始帧拆分为类型 1 和类型 1 帧,并使用双
lapply()
方法,对结果进行行绑定
t1 = example_df[,grepl("type1", names(example_df))]
t2 = example_df[,grepl("type2", names(example_df))]
library(data.table)
rbindlist(lapply(t1, \(i1) {
rbindlist(lapply(t2, \(i2) data.table(t1val = i1, t2val=i2, sum=i1+i2)[,row:=.I][sum>=0]),idcol = "t2_col")
}), idcol ="t1_col")
Output输出
t1_col t2_col t1val t2val sum row
1: col1type1 col2type2 110 -108 2 1
2: col1type1 col2type2 109 -107 2 2
3: col1type1 col2type2 108 -106 2 3
4: col1type1 col2type2 107 -105 2 4
5: col1type1 col2type2 106 -104 2 5
6: col1type1 col4type2 110 110 220 1
7: col1type1 col4type2 109 109 218 2
8: col1type1 col4type2 108 108 216 3
9: col1type1 col4type2 107 107 214 4
10: col1type1 col4type2 106 106 212 5
11: col1type1 col6type2 110 -110 0 1
12: col1type1 col6type2 109 -109 0 2
13: col1type1 col6type2 108 -108 0 3
14: col1type1 col6type2 107 -107 0 4
15: col1type1 col6type2 106 -106 0 5
16: col1type1 col10type2 110 105 215 1
17: col1type1 col10type2 109 104 213 2
18: col1type1 col10type2 108 103 211 3
19: col1type1 col10type2 107 102 209 4
20: col1type1 col10type2 106 101 207 5
21: col1type1 col12type2 110 -105 5 1
22: col1type1 col12type2 109 -104 5 2
23: col1type1 col12type2 108 -103 5 3
24: col1type1 col12type2 107 -102 5 4
25: col1type1 col12type2 106 -101 5 5
26: col3type1 col4type2 -109 110 1 1
27: col3type1 col4type2 -108 109 1 2
28: col3type1 col4type2 -107 108 1 3
29: col3type1 col4type2 -106 107 1 4
30: col3type1 col4type2 -105 106 1 5
31: col5type1 col4type2 107 110 217 1
32: col5type1 col4type2 106 109 215 2
33: col5type1 col4type2 105 108 213 3
34: col5type1 col4type2 104 107 211 4
35: col5type1 col4type2 103 106 209 5
36: col5type1 col10type2 107 105 212 1
37: col5type1 col10type2 106 104 210 2
38: col5type1 col10type2 105 103 208 3
39: col5type1 col10type2 104 102 206 4
40: col5type1 col10type2 103 101 204 5
41: col5type1 col12type2 107 -105 2 1
42: col5type1 col12type2 106 -104 2 2
43: col5type1 col12type2 105 -103 2 3
44: col5type1 col12type2 104 -102 2 4
45: col5type1 col12type2 103 -101 2 5
46: col7type1 col2type2 109 -108 1 1
47: col7type1 col2type2 110 -107 3 2
48: col7type1 col2type2 111 -106 5 3
49: col7type1 col2type2 112 -105 7 4
50: col7type1 col2type2 113 -104 9 5
51: col7type1 col4type2 109 110 219 1
52: col7type1 col4type2 110 109 219 2
53: col7type1 col4type2 111 108 219 3
54: col7type1 col4type2 112 107 219 4
55: col7type1 col4type2 113 106 219 5
56: col7type1 col6type2 110 -109 1 2
57: col7type1 col6type2 111 -108 3 3
58: col7type1 col6type2 112 -107 5 4
59: col7type1 col6type2 113 -106 7 5
60: col7type1 col10type2 109 105 214 1
61: col7type1 col10type2 110 104 214 2
62: col7type1 col10type2 111 103 214 3
63: col7type1 col10type2 112 102 214 4
64: col7type1 col10type2 113 101 214 5
65: col7type1 col12type2 109 -105 4 1
66: col7type1 col12type2 110 -104 6 2
67: col7type1 col12type2 111 -103 8 3
68: col7type1 col12type2 112 -102 10 4
69: col7type1 col12type2 113 -101 12 5
70: col9type1 col4type2 -105 110 5 1
71: col9type1 col4type2 -104 109 5 2
72: col9type1 col4type2 -103 108 5 3
73: col9type1 col4type2 -102 107 5 4
74: col9type1 col4type2 -101 106 5 5
75: col9type1 col10type2 -105 105 0 1
76: col9type1 col10type2 -104 104 0 2
77: col9type1 col10type2 -103 103 0 3
78: col9type1 col10type2 -102 102 0 4
79: col9type1 col10type2 -101 101 0 5
t1_col t2_col t1val t2val sum row
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.