R：选择data.table中的特定行

Question

I have a bit of a specific problem of selecting rows in a data.table, and so far not managed to solve it. 我在选择data.table中的行时遇到了一个特定的问题，但到目前为止还无法解决。 I have a dataset storing simulation results over a range of parameters. 我有一个数据集，用于存储一系列参数的模拟结果。 Columns in the dataset either contain parameters or result values, see code below ("p" for parameter columns and "v" for value columns. 数据集中的列包含参数或结果值，请参见下面的代码（“ p”代表参数列，“ v”代表值列。

# create dataset for demonstration
params <- expand.grid (seq(0,0.5,by=.1),
                       seq(1,10),
                       seq(100,105),
                       letters[1:4],
                       letters[10:14])
colnames(params) <- paste("p",1:5,sep="")
data <- data.table(cbind(params,runif(nrow(params)),rnorm(nrow(params))))
setnames(data, c(colnames(params),"v1","v2"))

I would now like to extract: for each p1, and for given values of p2 and p3,and for arbitrary values of p4, p5, the row where the value of v1 is minimal. 我现在想提取：对于每个p1，对于给定的p2和p3值，以及对于p4，p5的任意值，其中v1的值最小的行。 Let np4 and np5 be the number of unique values of p4 and p5, for each unique p1 and given p2, p3, I would like to select among the np4*np5 rows where p1, p2, p3 match that one row where v1 is minimal. 令np4和np5为p4和p5的唯一值的数目，对于每个唯一p1并给定p2，p3，我想在np4 * np5行中进行选择，其中p1，p2，p3匹配那一行，其中v1最小。 The desired output should then be a table with np1 rows selected from the original table, ie containing all variables the original did. 然后，所需的输出应该是一个具有从原始表中选择的np1行的表，即包含原始表所做的所有变量。 I know how to select rows from a data.table, how to use expressions and "by", but I have not managed to put that all together to produce the desired result. 我知道如何从data.table中选择行，如何使用表达式和“ by”，但是我还没有设法将所有这些放在一起以产生所需的结果。

UPDATE: I found the answer. 更新：我找到了答案。 The trick was, how to select the optimal row within the subset created by "by? (Of course, there was already a built-in) solution: 诀窍是，如何在“ by”创建的子集中选择最佳行？（当然，已经有一个内置的）解决方案：

np4 <- c("a", "b")
np5 <- c("m", "n")

ss2 <- data[ p4 %in% np4 & p5 %in% np5,
            .SD[which(v1==min(v1)),],
             by = "p1"]

From the data.table documentation: 从data.table文档中：

.SD is a data.table containing the Subset of x's Data for each group, excluding any columns used in by (or keyby). .SD是一个data.table，其中包含每个组x的数据子集，不包括by（或keyby）中使用的任何列。

Answer 1

This should work 这应该工作

np4 <- c("a", "b")
np5 <- c("m", "n")
data[p4 %in% np4 & p5 %in% np5,
     list(v1 = min(v1), v2 = v2[which.min(v1)]),
     by = c("p1", "p2", "p3", "p4", "p5")]

R：选择data.table中的特定行

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-03-14 13:03:41

R：选择data.table中的特定行

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-03-14 13:03:41

解决方案1
1 已采纳 2015-03-14 13:03:41