[英]Subset data.table by evaluating multiple columns
How to return 1 row for each unique name by most recent (latest) Type? 如何按最新(最新)类型为每个唯一名称返回1行?
DataTable with 6 rows: 具有6行的DataTable:
example <- data.table(c("Bob","May","Sue","Bob","Sue","Bob"),
c("A","A","A","A","B","B"),
as.Date(c("2010/01/01", "2010/01/01", "2010/01/01",
"2012/01/01", "2012/01/11", "2014/01/01")))
setnames(example,c("Name","Type","Date"))
setkey(example,Name,Date)
Should return 5 rows: 应该返回5行:
# 1: Bob A 2012-01-01
# 2: Bob B 2014-01-01
# 3: May A 2010-01-01
# 4: Sue A 2010-01-01
# 5: Sue B 2012-01-11
Since you've already sorted by Name
and Date
, you can use unique
(which calls unique.data.table
) function on the columns Name
and Type
, with fromLast = TRUE
. 既然你已经排序Name
和Date
,你可以使用unique
(这就要求unique.data.table
)在列函数Name
和Type
,与fromLast = TRUE
。
require(data.table) ## >= v1.9.3
unique(example, by=c("Name", "Type"), fromLast=TRUE)
# Name Type Date
# 1: Bob A 2012-01-01
# 2: Bob B 2014-01-01
# 3: May A 2010-01-01
# 4: Sue A 2010-01-01
# 5: Sue B 2012-01-11
This'll pick the last row for each Name,Type
group. 这将为每个Name,Type
组选择最后一行。 Hope this helps. 希望这可以帮助。
PS: As @mso points out, this needs 1.9.3
because the fromLast
argument was implemented only in 1.9.3
(available from github). PS:正如@mso指出的那样,这需要1.9.3
因为fromLast
参数仅在1.9.3
中实现(可从github获得)。
Following versions of @Arun answer work: 以下版本的@Arun回答作品:
unique(example[rev(order(Name,Date))], by=c("Name", "Type"), fromLast=TRUE)[order(Name,Date)]
Name Type Date
1: Bob A 2012-01-01
2: Bob B 2014-01-01
3: May A 2010-01-01
4: Sue A 2010-01-01
5: Sue B 2012-01-11
unique(example[order(Name, Date, decreasing=T)], by=c("Name","Type"))[order(Name, Date)]
Name Type Date
1: Bob A 2012-01-01
2: Bob B 2014-01-01
3: May A 2010-01-01
4: Sue A 2010-01-01
5: Sue B 2012-01-11
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.