[英]Expand grid of all possible combinations within groups
我面临以下问题。 我有一份并购交易清单,每笔交易都包括(1)收单方,(2)供应商,(3)目标的数据。 数据的结构关系可以是n:n:n,看起来类似于以下内容:
dealid acquirer target vendor
1 FirmA FirmB FirmC
1 FirmD FirmE
2 .....................
所以问题在于交易中的行本身没有意义,因此,例如,FirmD也是FirmB的共同收购者。
我现在需要在每个dealid中创建所有可能的acquirer -target-vendor组合。 我已经设法使用expand.grid
函数或简单地通过merge
来扩展网格。 但是,我不知道如何扩展组内所有可能组合的网格。
你可以用dplyr
做到这dplyr
并从tidyr
expand
。
df <- read.table(text="dealid acquirer target vendor
1 FirmA FirmB FirmC
1 FirmD NA FirmE
2 FirmA NA FirmC
2 FirmD NA FirmE
2 FirmG FirmF FirmE",header=TRUE,stringsAsFactors=FALSE)
library(dplyr);library(tidyr)
df%>%
group_by(dealid)%>%
expand(acquirer, target, vendor)
dealid acquirer target vendor
<int> <chr> <chr> <chr>
1 1 FirmA FirmB FirmC
2 1 FirmA FirmB FirmE
3 1 FirmD FirmB FirmC
4 1 FirmD FirmB FirmE
5 2 FirmA FirmF FirmC
6 2 FirmA FirmF FirmE
7 2 FirmD FirmF FirmC
8 2 FirmD FirmF FirmE
9 2 FirmG FirmF FirmC
10 2 FirmG FirmF FirmE
我们可以使用data.table
library(data.table)
setDT(df1)[, CJ(acquirer = acquirer, target = target, vendor = vendor,
unique = TRUE), dealid][!is.na(target)]
# dealid acquirer target vendor
#1: 1 FirmA FirmB FirmC
#2: 1 FirmA FirmB FirmE
#3: 1 FirmD FirmB FirmC
#4: 1 FirmD FirmB FirmE
#5: 2 FirmA FirmF FirmC
#6: 2 FirmA FirmF FirmE
#7: 2 FirmD FirmF FirmC
#8: 2 FirmD FirmF FirmE
#9: 2 FirmG FirmF FirmC
#10: 2 FirmG FirmF FirmE
df1 <- structure(list(dealid = c(1L, 1L, 2L, 2L, 2L), acquirer = c("FirmA",
"FirmD", "FirmA", "FirmD", "FirmG"), target = c("FirmB", NA,
NA, NA, "FirmF"), vendor = c("FirmC", "FirmE", "FirmC", "FirmE",
"FirmE")), .Names = c("dealid", "acquirer", "target", "vendor"
), class = "data.frame", row.names = c(NA, -5L))
考虑基础R的by
,通过因子分组(dealid)切片一个数据帧,允许延长重复操作,如功能expand.grid
返回dataframes列表。 下面使用与@PLapointe和@akrun相同的数据样本:
dfList <- by(df, df$dealid, function(i){
tmp <- cbind(dealid=max(i$dealid),
expand.grid(acquirer=i$acquirer, target=i$target, vendor=i$vendor))
tmp[!is.na(tmp$target),]
})
newdf <- unique(do.call(rbind, dfList))
row.names(newdf) <- NULL
newdf
# dealid acquirer target vendor
# 1 1 FirmA FirmB FirmC
# 2 1 FirmD FirmB FirmC
# 3 1 FirmA FirmB FirmE
# 4 1 FirmD FirmB FirmE
# 5 2 FirmA FirmF FirmC
# 6 2 FirmD FirmF FirmC
# 7 2 FirmG FirmF FirmC
# 8 2 FirmA FirmF FirmE
# 9 2 FirmD FirmF FirmE
# 10 2 FirmG FirmF FirmE
在评论中提到@Sotos split
:
l1 <- split(df1, df1$dealid)
l2 <- lapply(l1, function(x) unique(with(x, expand.grid(acquirer, na.omit(target), vendor))))
df2 <- cbind.data.frame(dealid = rep(names(l2), sapply(l2, nrow)), do.call(rbind, l2))
这导致:
> df2
dealid Var1 Var2 Var3
1.1 1 FirmA FirmB FirmC
1.2 1 FirmD FirmB FirmC
1.3 1 FirmA FirmB FirmE
1.4 1 FirmD FirmB FirmE
2.1 2 FirmA FirmF FirmC
2.2 2 FirmD FirmF FirmC
2.3 2 FirmG FirmF FirmC
2.4 2 FirmA FirmF FirmE
2.5 2 FirmD FirmF FirmE
2.6 2 FirmG FirmF FirmE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.