简体   繁体   English

使用数据帧列表和向量列表并行子集

[英]subset in parallel using a list of dataframes and a list of vectors

This works: 这有效:

onion$yearone$id %in% mask$yearone

This doesn't: 这不是:

onion[1][1] %in% mask[1]
onion[1]['id'] %in% mask[1]

Why? 为什么? Short of an obvious way to vectorize in parallel columns in DF and in memberids (so I only get rows within each year when ids are present in both DF and memberids), im using a for loop, but I'm not being lucky at finding the right way to express the index... Help? 缺少一种在DF和memberid中的并行列中进行矢量化的明显方法(因此,当DF和memberid中都包含id时,我每年都只能得到行),我使用了for循环,但是我不太幸运能找到表达索引的正确方法...帮助吗?

Example data: 示例数据:

yearone <- data.frame(id=c("b","b","c","a","a"),v=rnorm(5))
onion <- list()
onion[[1]] <- yearone
names(onion) <- 'yearone'
mask <- list()
mask[[1]] <- c('a','c')
names(mask) <- 'yearone'

The '$' operator is not the same as the '[' operator. '$'运算符与'['运算符不同。 If the "yearone' and 'ids' are in fact the first items in those lists you should see that this is giving the same results as the first call: 如果“ yearone”和“ ids”实际上是这些列表中的第一项,则您应该看到它给出的结果与第一次调用相同:

DF[[1]][[1]] %in% memberids[[1]]

Why we should think that accessing yearpathall should give the same results is entirely unclear at this point, but using the "[[" operator will possibly give an atomic vector, whereas using "[" will certainly not. 在这一点上,为什么我们认为访问yearpathall应该给出相同的结果尚不清楚,但是使用“ [[”运算符可能会给出一个原子向量,而使用“ [”肯定不会。 The "[" operator always returns a result that is the same class as its first argument so in this case would be a list rather than a vector, for both 'DF' and 'memberids'. “ [”运算符始终返回与其第一个参数相同的类的结果,因此对于'DF'和'memberids',在这种情况下将是列表而不是向量。 The %in% operator is just an infix version fo match and needs an atomic vector as both of its arguments %in%运算符只是match的中缀版本,并且需要原子向量作为其两个参数

Here is an approach using Map 这是使用Map的方法

# some data
onion <- replicate(5,data.frame(id = sample(letters[1:3], 5,T), v = 1:5), 
                   simplify = F)
mask <- replicate(5, sample(letters[1:3],2), simplify = F)
names(onion) <- names(mask) <- paste0('year', seq_along(onion))

A function that will do the matching 进行匹配的功能

get_matches <- function(data, id, mask){
   rows <- data[[id]] %in% mask
   data[rows,]
}


Map(get_matches , data = onion, mask = mask, MoreArgs = list(id = 'id'))

This seems to be the answer I was seeking: 这似乎是我正在寻找的答案:

merge(mask[1],onion[[1]], by.x = names(mask[1]), by.y = names(onion[[1]][1]))

And applied to parallel lists of dataframes: 并应用于数据帧的并行列表:

result <- list()
for (i in 1:(length(names(onion)))) {
  result[[i]] <- merge(mask[i],onion[[i]], by.x = names(mask[i]), by.y = names(onion[[i]][1]))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM