简体   繁体   English

在group-by max()调用上转发匹配的行值(以了解与特定列的最大值对应的值)

[英]Bring forward matching row values on a group-by max() call (to know values corresponding to max value of a specific column)

I am trying to find out, for each year represented in a data table, what the max daily frequency for an event was each year and what day it occurred on. 我试图找出,在数据表中显示的每一年,事件的最大每日频率是每年以及它发生的日期。 I can get the max value by year like so: 我可以按年获得最大值:

dt[, .N, by = DATE][, .(max(N)), by=format(DATE, "%Y")]

But how can I bring forward the complete DATE (not just year) that matches this max value? 但是,如何才能提出与此最大值匹配的完整DATE (而不仅仅是年份)?

Here's what I tried: 这是我试过的:

dt[, .N, by=DATE][which(N==max(N)), .(max(N), d:=DATE),by=format(DATE, "%Y")]

which certainly doesn't even look like it should work, and doesn't, per this error message: 这肯定甚至看起来不应该工作,并且不会,根据此错误消息:

Error in `[.data.table`(dt[, .N, by = DATE], which(N == max(N)), .(max(N),  : 
  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(format(DATE, "%Y")) should work. This is for efficiency so data.table can detect which columns are needed.

I see how I can easily backtrack to dt and grab the rows corresponding to the max, but I'd like to do it better than that. 我看到我如何能够轻松地回溯到dt并抓住与最大值对应的行,但我想做得更好。 Is there a way to do this with subset selections, as above? 有没有办法用子集选择这样做,如上所述?

Apologies if I missed an SO post on this, but couldn't find anything. 如果我错过了关于此的SO帖子,但是找不到任何内容,请道歉。

Here's a sample of dt : 这是dt的示例:

> dput(dt[sample(1:600000, size = 500), DATE])
structure(c(16091, 15909, 15987, 16509, 16294, 16610, 16297, 
15898, 15928, 15949, 16351, 16203, 16215, 15799, 16506, 15931, 
16091, 15825, 15860, 15814, 15975, 16233, 16108, 16590, 15700, 
16019, 16178, 16287, 16730, 16366, 16678, 16010, 16157, 16116, 
15794, 16157, 16010, 16171, 16721, 16640, 16302, 15939, 15928, 
16325, 15837, 15848, 15730, 15828, 16414, 16431, 16389, 16003, 
16444, 16255, 16268, 16226, 16205, 15765, 16060, 15938, 16376, 
15934, 15871, 16163, 16568, 15899, 16597, 16160, 16538, 15703, 
16002, 16371, 16019, 16138, 16091, 15874, 16298, 16086, 15753, 
16310, 16209, 15843, 16307, 16472, 16319, 16519, 15743, 16480, 
16323, 16674, 16147, 16013, 15986, 16616, 16480, 16494, 16030, 
16614, 16447, 15991, 15977, 15884, 16707, 16614, 16470, 16193, 
16453, 16342, 16109, 15731, 16321, 16421, 15974, 16578, 16718, 
16183, 15721, 15854, 16470, 16368, 16399, 16433, 16721, 16624, 
16514, 15918, 16370, 15910, 16308, 15973, 16579, 16606, 16192, 
16445, 16671, 15927, 15958, 16140, 15957, 16623, 16416, 15852, 
15913, 16190, 15930, 16420, 15808, 15862, 16507, 16447, 16109, 
15732, 16700, 15911, 16183, 16215, 16584, 15840, 16628, 16138, 
16500, 16477, 16184, 16510, 16374, 16668, 16278, 16642, 16713, 
16324, 16200, 16255, 15960, 16395, 15869, 16282, 16736, 16164, 
16416, 16496, 16565, 15741, 16308, 16441, 16607, 16190, 15938, 
16045, 15758, 16219, 16165, 16357, 16353, 16731, 16063, 15740, 
16220, 16522, 15864, 15922, 16223, 15806, 16660, 16471, 15954, 
16369, 15750, 15957, 16156, 16367, 16654, 16165, 16109, 15863, 
16204, 15929, 15812, 15987, 16275, 16552, 15741, 15906, 15929, 
16295, 15974, 15749, 15830, 15892, 16266, 16208, 15793, 15768, 
15721, 16707, 15903, 16624, 16552, 16695, 16116, 16573, 16344, 
16452, 16539, 16195, 15851, 16140, 16152, 15736, 16179, 15846, 
16363, 16404, 16522, 16723, 16021, 16232, 16081, 16206, 16183, 
15920, 16543, 15989, 15974, 16212, 16396, 16473, 16502, 16532, 
16326, 15882, 16607, 15848, 15954, 16419, 15752, 16030, 16429, 
16222, 16213, 16626, 16049, 16738, 16256, 16198, 16599, 15727, 
16707, 16433, 15863, 16145, 16188, 15862, 15707, 16475, 16130, 
15887, 16647, 15974, 16221, 15773, 16059, 16662, 16250, 15689, 
15753, 15833, 16365, 16646, 16366, 16130, 16712, 15859, 16480, 
15983, 16377, 16091, 16121, 15821, 16505, 16018, 16254, 15937, 
16322, 16490, 15899, 16377, 16319, 16262, 16215, 16005, 16318, 
16488, 16350, 16275, 16723, 16616, 16593, 15918, 16264, 15897, 
15931, 16204, 16603, 16192, 16377, 15837, 16737, 16466, 16271, 
15804, 15987, 16622, 16634, 16227, 16297, 16597, 16232, 16393, 
15842, 15999, 15716, 16092, 16080, 16553, 16068, 16129, 16012, 
16383, 16150, 16611, 16602, 16254, 15728, 15958, 15827, 16111, 
16097, 16112, 16648, 16510, 16417, 16021, 16660, 15793, 16016, 
16188, 16034, 16415, 16270, 16728, 16153, 16028, 16286, 16731, 
15905, 15710, 16208, 16300, 16522, 16062, 16310, 16535, 16111, 
16682, 15957, 16051, 16597, 16063, 15828, 16658, 16213, 16262, 
15814, 15912, 16115, 15716, 15976, 16665, 16723, 15766, 15825, 
16682, 16547, 16402, 16486, 16085, 16231, 16126, 16398, 15762, 
16563, 15796, 15993, 15943, 16020, 15727, 16671, 16044, 15921, 
16511, 15787, 16128, 16376, 16502, 15751, 16317, 16444, 16032, 
15839, 16588, 15780, 15926, 16722, 16225, 16523, 16450, 16661, 
16702, 16223, 15977, 16586, 16221, 16252, 15853, 16309, 15838, 
16505, 16143, 16526, 15980, 15970, 15718, 16713, 16021, 16546, 
16469, 16452, 15729, 16309, 16543, 16386, 16554, 16349, 16595, 
16499, 16359, 16322, 16547, 16415, 16112, 15898, 16008, 16275, 
15975, 16197, 15740, 15959, 16346, 16364, 16522), class = "Date")

Why not simply subset .SD with which.max(N) ? 为什么不简单地将.SDwhich.max(N)

require(data.table)
data.table(x)[, .N, by=x][, .SD[which.max(N)], by=year(x), .SDcols=1:2]
#    year          x N
# 1: 2014 2014-01-21 4
# 2: 2013 2013-09-26 4
# 3: 2015 2015-03-28 4
# 4: 2012 2012-12-26 1

Once you're familiar with .SD , most operations just use base R functions. 一旦熟悉.SD ,大多数操作只使用基本R函数。


On your tries: The general form of data.table is to susbet rows in i , and then compute j grouped by by . 在你的尝试:data.table的一般形式是在i susbet行, 然后计算by分组的j So, you can't provided the condition in i and group in by . 所以,你不能提供的情况i在和小组by And .(d := DATE) isn't valid syntax at all. 并且.(d := DATE)根本不是有效的语法。

Please read the vignettes . 请阅读小插曲 These things are all in there. 这些东西都在那里。

This is what I came up with: 这就是我想出的:

DT[, Y := year(DATE)]

DT[,
  copy(.SD)[, n := .N , by=DATE][which.max(n)]
, by=Y]


      Y       DATE n
1: 2014 2014-01-21 4
2: 2013 2013-09-26 4
3: 2015 2015-03-28 4
4: 2012 2012-12-26 1

I'm hoping there is a better way. 我希望有更好的方法。 I created Y because data.table currently doesn't allow columns to be used inside j if any transformation of them appears in by . 我创建Y因为data.table目前不允许列内使用j如果它们中的任何改造中出现by

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM