
[英]R : How can I remove singletons within tidyverse groups across all columns of my dataframe?
[英]R: apply functions and/or regress over across all the pairs of columns within groups (defined by time)
数据集看起来像这样 第一列是时间(年-月)所有其他列都是公司 a、公司 b、公司 c 等的数值。下面是假数据
library(tidyverse)
data <- tibble(time = c(1,1,2,2), a = c(1,2,3,4), b =c(4,3,2,1), c = c(1,1,1,1))
该操作需要在每组时间内对每对列 a、b、c 应用函数和/或运行回归 应用任意函数(对两列求和)后,预期输出将如下所示
expected_output <- tibble(t = c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2),
firm1 = c("a","a","a","b","b","b","c","c","c","a","a","a","b","b","b","c","c","c"),
firm2 = c("a","b","c","a","b","c","a","b","c","a","b","c","a","b","c","a","b","c"),
value = c(6,10,5,10,14,9,5,9,4,14,10,9,10,6,5,9,5,4))
expected_output
该函数可以是具有两个输入的任意函数(公司 a、公司 b、公司 c 等的列对)
fun1<-function(x, y){
sum(x, y)
}
我目前使用 tidyverse 的方法如下,感谢 @akrun 和 @Ronak Shah
创建所有对公司 a, b, c
tmp <- expand.grid(firm1 = names(data[-1]), firm2 = names(data[-1]))
result1 <- data %>%
group_split(time) %>%
map_df(~cbind(time = .x$time[1], tmp, value = apply(tmp, 1, function(x) fun1(.x[[x[1]]],.x[[x[2]]]))))
result1
问题是,使用真实数据,上面的代码运行速度很慢。 我想原因是'map_df'。 但我对这种数据分析很陌生。 真实数据有2000多列,带来超过400万对列。 由于 data.table 以其效率和精度而闻名。 我想知道 data.table 中是否有办法完成这样的操作?
即使不使用 data.table,这也应该快得多
library(tidyverse)
data <- tibble(time = c(1,1,2,2), a = c(1,2,3,4), b =c(4,3,2,1), c = c(1,1,1,1))
data
#> # A tibble: 4 × 4
#> time a b c
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 4 1
#> 2 1 2 3 1
#> 3 2 3 2 1
#> 4 2 4 1 1
fun1<-function(x, y){
sum(x, y)
}
data_agg = data %>%
group_by(time) %>%
summarise(across(.cols = a:c, .fns = sum))
data_agg
#> # A tibble: 2 × 4
#> time a b c
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 3 7 2
#> 2 2 7 3 2
data_agg_long <- data_agg %>%
pivot_longer(a:c)
data_agg_long
#> # A tibble: 6 × 3
#> time name value
#> <dbl> <chr> <dbl>
#> 1 1 a 3
#> 2 1 b 7
#> 3 1 c 2
#> 4 2 a 7
#> 5 2 b 3
#> 6 2 c 2
data_agg_long %>%
full_join(data_agg_long,
by = "time") %>%
# might not be necessary if the function is vectorised
rowwise() %>%
mutate(value = fun1(value.x, value.y)) %>%
select(time, firm1 = name.x, firm2 = name.y, value)
#> # A tibble: 18 × 4
#> # Rowwise:
#> time firm1 firm2 value
#> <dbl> <chr> <chr> <dbl>
#> 1 1 a a 6
#> 2 1 a b 10
#> 3 1 a c 5
#> 4 1 b a 10
#> 5 1 b b 14
#> 6 1 b c 9
#> 7 1 c a 5
#> 8 1 c b 9
#> 9 1 c c 4
#> 10 2 a a 14
#> 11 2 a b 10
#> 12 2 a c 9
#> 13 2 b a 10
#> 14 2 b b 6
#> 15 2 b c 5
#> 16 2 c a 9
#> 17 2 c b 5
#> 18 2 c c 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.