简体   繁体   English

R data.table - 通过对按列编码的子集求和来更新

[英]R data.table - update by summing over subsets coded by columns

I have the following problem.我有以下问题。 I have a list of sets encoded in a data.table sets where id.s encodes the id of the set and id.e encodes its element.我有一个在 data.table sets中编码的sets列表,其中id.s对集合的 id 进行编码,而id.e对其元素进行编码。 For each set s there is its value m(s) .对于每个集合s都有它的值m(s) Values of the function m() are in another data.table m where each row contains an id of the set id.s and its value .函数m()的值位于另一个 data.table m ,其中每一行包含集合id.s的 id 及其value

sets <- data.table(
    id.s = c(1,2,2,3,3,3,4,4,4,4),
    id.e = c(3,3,4,2,3,4,1,2,3,4))

v <- data.table(id.s = 1:4, value = c(1/10,2/10,3/10,4/10))

I need to calculate new function v'() such that我需要计算新函数v'()使得

公式

where |s|其中|s| denoted the cardinality of the set s (the number of elements) and b \\ a denotes sets subtraction (a way of modifying a set b by removing the joint elements with set a )表示集合s的基数(元素的数量), b \\ a表示集合减法(一种通过删除带有集合a的联合元素来修改集合b的方法

Right now, I do it using a for-loop where I update row by row.现在,我使用 for 循环来逐行更新。 Nevertheless, it takes too much time for large data.tables with thousands of sets with thousands of elements.然而,对于包含数千个元素的数千个集合的大型 data.tables 需要太多时间。

Do you have any idea how to make it easier?你知道如何让它更容易吗?

My current code:我目前的代码:

# convert data.table to wide format 
dc <- dcast(sets, id.s ~ id.e, drop = FALSE, value.var = "id.e" , fill = 0)
# take columns corresponding to elements id.e
cols <- names(dc)[-1]
# convert columns cols to 0-1 coding
dc[, (cols) := lapply(.SD, function(x) ifelse(x > 0,1,0)), .SDcols = cols]

# join dc with v
dc <- dc[v, on = "id.s"]

# calculate the cardinality of each set
dc[, cardinality := sum(.SD > 0), .SDcols = cols, by = id.s]

# prepare column for new value
dc[, value2 := 0]

#   id.s 1 2 3 4 value cardinality value2
#1:    1 0 0 1 0   0.1           1      0
#2:    2 0 0 1 1   0.2           2      0
#3:    3 0 1 1 1   0.3           3      0
#4:    4 1 1 1 1   0.4           4      0

# for each set (row of dc)
for(i in 1:nrow(dc)) {
  row <- dc[i,]
  set <- as.numeric(row[,cols, with = F])
  row.cardinality <- as.numeric(row$cardinality)
  # find its supersets
  dc[,is.superset := ifelse(rowSums(mapply("*",dc[,cols,with=FALSE],set))==row.cardinality,1,0)][]
  # use the formula to update the value
  res <- dc[is.superset==1,][, sum := sum((-1)^(cardinality - row.cardinality)*value)]$sum[1]
  dc[i,value2 := res]
}

dc[,.(id.s, value2), with = TRUE]
#   id.s value2
#1:    1   -0.2
#2:    2    0.3
#3:    3   -0.1
#4:    4    0.4

This might work for you:这可能对你有用:

Make a little function to get the superset for each set做一个小函数来获取每个集合的超集

get_superset <- function(el, setvalue) {
  c(setvalue, sets[id.s!=setvalue, setequal(intersect(el, id.e), el), by=id.s][V1==TRUE, id.s])
}
  1. Get cardinality of each set in the sets object, but also save separately for later use (see step 4)获取sets对象中每个set的基数,也单独保存以备后用(见步骤4)
sets[, cardinality:=.N, by=.(id.s)]
cardinality = unique(sets[, .(id.s, cardinality)])
  1. Add supersets, by set, using above function使用上述函数按集合添加超集
sets <- unique(sets[,!c("id.e")][sets[, .("supersets"=get_superset(id.e, .GRP)),by=id.s], on=.(id.s)])

(Note: As an alternative, step 2 could be broken into three sub-steps, like this) (注意:作为替代方案,步骤 2 可以分为三个子步骤,如下所示)

# 2a. Get the supersets
supersets = sets[, .("supersets"=get_superset(id.e, .GRP)),by=id.s]
# 2b. Merge the supersets on the original sets 
sets = sets[supersets, on=.(id.s)]
# 2c. Retain only necessary columns, and make unique
sets = unique(sets[, .(id.s, cardinality,supersets)])
  1. add value增加价值
sets <- sets[v,on=.(supersets=id.s)][order(id.s)]
  1. grab cardinality of each superset获取每个超集的基数
sets <- sets[cardinality, on=.(supersets=id.s)]
  1. get the result (ie estimate your v' function)得到结果(即估计你的 v' 函数)
result = sets[, .(value2 = sum((-1)^(i.cardinality-cardinality)*value)), by=.(id.s)]

Output:输出:

   id.s value2
1:    1   -0.2
2:    2    0.3
3:    3   -0.1
4:    4    0.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM