[英]Subsetting data.table with a condition
How to sample a subsample of large data.table ( data.table
package)? 如何对大型data.table(
data.table
包)的子样本进行采样? Is there more elegant way to perform the following 是否有更优雅的方式来执行以下操作
DT<- data.table(cbind(site = rep(letters[1:2], 1000), value = runif(2000)))
DT[site=="a"][sample(1:nrow(DT[site=="a"]), 100)]
Guess there is a simple solution, but can't choose the right wording to search for. 猜猜有一个简单的解决方案,但无法选择正确的措辞来搜索。
UPDATE: More generally, how can I access a row number in data.table's i
argument without creating temporary column for row number? 更新:更一般地说,如何在不创建行号的临时列的情况下访问data.table的
i
参数中的行号?
One of the biggest benefits of using data.table
is that you can set a key for your data. 使用
data.table
的最大好处之一是您可以为数据设置密钥。
Using the key
and then .I
(a built in vairable. see ?data.table
for more info) you can use: 使用
key
,然后.I
(内置可修复。请参阅?data.table
了解更多信息),您可以使用:
setkey(DT, site)
DT[DT["a", sample(.I, 100)]]
As for your second question "how can I access a row number in data.table's i argument"
至于你的第二个问题
"how can I access a row number in data.table's i argument"
# Just use the number directly:
DT[17]
Using which
, you can find the row-numbers. 使用
which
,你可以找到行号。 Instead of sampling from 1:nrow(...)
you can simply sample from all rows with the desired property. 您可以简单地从具有所需属性的所有行中进行采样,而不是从
1:nrow(...)
采样。 In your example, you can use the following: 在您的示例中,您可以使用以下内容:
DT[sample(which(site=="a"), 100)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.