简体   繁体   English

data.frame的复杂子集

[英]Complex subset of a data.frame

I have a data frame with close to a million objects in it. 我有一个包含近一百万个对象的数据框。 I need an efficient to way to subset the data based on multiple criteria. 我需要一种有效的方法来基于多个条件对数据进行子集化。 I can do this is a for loop but was wondering if there is a more elegant way to do this. 我可以这样做是一个for循环,但是想知道是否有更优雅的方法可以做到这一点。

Time    Instance    Server  Metric  Value
17/08/2014 04:00:00 PM  ID1 Server888   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server999   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server777   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server888   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server999   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID1 Server777   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID2 Server888   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID2 Server888   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID3 Server999   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID3 Server999   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID3 Server777   disk.commandsaveraged.average   0
17/08/2014 04:05:00 PM  ID3 Server777   disk.commandsaveraged.average   0
17/08/2014 04:00:00 PM  ID1 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID1 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID2 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID2 Server888   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID3 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID3 Server999   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID3 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:05:00 PM  ID3 Server777   disk.numberreadaveraged.average 0
17/08/2014 04:00:00 PM  ID1 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID7 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID1 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID1 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID1 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID7 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID2 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID5 Server888   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID3 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID4 Server999   disk.numberwriteaveraged.average    0
17/08/2014 04:00:00 PM  ID3 Server777   disk.numberwriteaveraged.average    0
17/08/2014 04:05:00 PM  ID3 Server777   disk.numberwriteaveraged.average    0

What I want to do is create a subset where metric == disk.numberwriteaveraged.average , Server == Server999 & Server == Server888 AND WHERE both servers have the same instance ID's in common. 我想做的是创建一个子集,其中metric == disk.numberwriteaveraged.averageServer == Server999 & Server == Server888两台服务器的实例ID相同。

NOTE, I use the term subset purely because I don't know of any other way to filter data i R, still learning. 注意,我纯粹使用术语“子集”是因为我不知道仍然可以学习的其他任何方法来过滤数据i R。 I am looking for speed and I will be generating data sets much larger than my current one. 我正在寻找速度,并且我将生成比当前数据集大得多的数据集。

(If I understand your question correctly) In your case, data.table is your friend. (如果我正确理解了您的问题),在您的情况下, data.table是您的朋友。 Try (assuming df is your data set): 尝试(假设df是您的数据集):

library(data.table)
df2 <- setDT(df)[, .SD[Metric == "disk.commandsaveraged.average" & 
            (Server == "Server999" | Server == "Server888")], by = Instance]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM