繁体   English   中英

展开组内所有可能组合的网格

[英]Expand grid of all possible combinations within groups

我面临以下问题。 我有一份并购交易清单,每笔交易都包括(1)收单方,(2)供应商,(3)目标的数据。 数据的结构关系可以是n:n:n,看起来类似于以下内容:

dealid acquirer target vendor
1      FirmA    FirmB  FirmC
1      FirmD           FirmE
2      .....................

所以问题在于交易中的行本身没有意义,因此,例如,FirmD也是FirmB的共同收购者。

我现在需要在每个dealid中创建所有可能的acquirer -target-vendor组合。 我已经设法使用expand.grid函数或简单地通过merge来扩展网格。 但是,我不知道如何扩展组内所有可能组合的网格。

你可以用dplyr做到这dplyr并从tidyr expand

df <- read.table(text="dealid acquirer target vendor
1      FirmA    FirmB  FirmC
1      FirmD    NA     FirmE
2      FirmA    NA     FirmC
2      FirmD    NA     FirmE
2      FirmG    FirmF  FirmE",header=TRUE,stringsAsFactors=FALSE)

library(dplyr);library(tidyr)
df%>%
  group_by(dealid)%>%
  expand(acquirer, target, vendor)

   dealid acquirer target vendor
    <int>    <chr>  <chr>  <chr>
 1      1    FirmA  FirmB  FirmC
 2      1    FirmA  FirmB  FirmE
 3      1    FirmD  FirmB  FirmC
 4      1    FirmD  FirmB  FirmE
 5      2    FirmA  FirmF  FirmC
 6      2    FirmA  FirmF  FirmE
 7      2    FirmD  FirmF  FirmC
 8      2    FirmD  FirmF  FirmE
 9      2    FirmG  FirmF  FirmC
10      2    FirmG  FirmF  FirmE

我们可以使用data.table

library(data.table)
setDT(df1)[, CJ(acquirer = acquirer, target = target, vendor = vendor,
         unique = TRUE), dealid][!is.na(target)]
#    dealid acquirer target vendor
#1:      1    FirmA  FirmB  FirmC
#2:      1    FirmA  FirmB  FirmE
#3:      1    FirmD  FirmB  FirmC
#4:      1    FirmD  FirmB  FirmE
#5:      2    FirmA  FirmF  FirmC
#6:      2    FirmA  FirmF  FirmE
#7:      2    FirmD  FirmF  FirmC
#8:      2    FirmD  FirmF  FirmE
#9:      2    FirmG  FirmF  FirmC
#10:     2    FirmG  FirmF  FirmE

数据

 df1 <- structure(list(dealid = c(1L, 1L, 2L, 2L, 2L), acquirer = c("FirmA", 
"FirmD", "FirmA", "FirmD", "FirmG"), target = c("FirmB", NA, 
NA, NA, "FirmF"), vendor = c("FirmC", "FirmE", "FirmC", "FirmE", 
"FirmE")), .Names = c("dealid", "acquirer", "target", "vendor"
), class = "data.frame", row.names = c(NA, -5L))

考虑基础R的by ,通过因子分组(dealid)切片一个数据帧,允许延长重复操作,如功能expand.grid返回dataframes列表。 下面使用与@PLapointe和@akrun相同的数据样本:

dfList <- by(df, df$dealid, function(i){
  tmp <- cbind(dealid=max(i$dealid),
               expand.grid(acquirer=i$acquirer, target=i$target, vendor=i$vendor))
  tmp[!is.na(tmp$target),]
})

newdf <- unique(do.call(rbind, dfList))
row.names(newdf) <- NULL

newdf
#     dealid acquirer target vendor
# 1        1    FirmA  FirmB  FirmC
# 2        1    FirmD  FirmB  FirmC
# 3        1    FirmA  FirmB  FirmE
# 4        1    FirmD  FirmB  FirmE
# 5        2    FirmA  FirmF  FirmC
# 6        2    FirmD  FirmF  FirmC
# 7        2    FirmG  FirmF  FirmC
# 8        2    FirmA  FirmF  FirmE
# 9        2    FirmD  FirmF  FirmE
# 10       2    FirmG  FirmF  FirmE

在评论中提到@Sotos split

l1 <- split(df1, df1$dealid)
l2 <- lapply(l1, function(x) unique(with(x, expand.grid(acquirer, na.omit(target), vendor))))
df2 <- cbind.data.frame(dealid = rep(names(l2), sapply(l2, nrow)), do.call(rbind, l2))

这导致:

> df2
    dealid  Var1  Var2  Var3
1.1      1 FirmA FirmB FirmC
1.2      1 FirmD FirmB FirmC
1.3      1 FirmA FirmB FirmE
1.4      1 FirmD FirmB FirmE
2.1      2 FirmA FirmF FirmC
2.2      2 FirmD FirmF FirmC
2.3      2 FirmG FirmF FirmC
2.4      2 FirmA FirmF FirmE
2.5      2 FirmD FirmF FirmE
2.6      2 FirmG FirmF FirmE

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM