简体   繁体   English

使用条件子集data.table

[英]Subsetting data.table with a condition

How to sample a subsample of large data.table ( data.table package)? 如何对大型data.table( data.table包)的子样本进行采样? Is there more elegant way to perform the following 是否有更优雅的方式来执行以下操作

DT<- data.table(cbind(site = rep(letters[1:2], 1000), value = runif(2000)))
DT[site=="a"][sample(1:nrow(DT[site=="a"]), 100)]

Guess there is a simple solution, but can't choose the right wording to search for. 猜猜有一个简单的解决方案,但无法选择正确的措辞来搜索。

UPDATE: More generally, how can I access a row number in data.table's i argument without creating temporary column for row number? 更新:更一般地说,如何在不创建行号的临时列的情况下访问data.table的i参数中的行号?

One of the biggest benefits of using data.table is that you can set a key for your data. 使用data.table的最大好处之一是您可以为数据设置密钥。
Using the key and then .I (a built in vairable. see ?data.table for more info) you can use: 使用key ,然后.I (内置可修复。请参阅?data.table了解更多信息),您可以使用:

setkey(DT, site)
DT[DT["a", sample(.I, 100)]] 

As for your second question "how can I access a row number in data.table's i argument" 至于你的第二个问题"how can I access a row number in data.table's i argument"

# Just use the number directly:
DT[17]

Using which , you can find the row-numbers. 使用which ,你可以找到行号。 Instead of sampling from 1:nrow(...) you can simply sample from all rows with the desired property. 您可以简单地从具有所需属性的所有行中进行采样,而不是从1:nrow(...)采样。 In your example, you can use the following: 在您的示例中,您可以使用以下内容:

DT[sample(which(site=="a"), 100)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM