I have the following (simplified) dataset:
df <- data.frame(a=c("A","A","B","B","B"),x=c(1,2,3,3,4))
df
a x
1 A 1
2 A 2
3 B 3
4 B 3
5 B 4
Since I'm working with large datasets, I use the data.table package.
Is there a way to get those lines in df, where x is minimal grouped by a. So in this case, I want to select lines 1,3 and 4.
Something like
df[,min(x),by=a]
But that doesn't give me the lines I wanna have, it just Shows me the minmum values for x grouped by a.
Any suggestions?
library(data.table)
dt <- data.table(a=c("A","A","B","B","B"), x=c(1,2,3,3,4))
These give only unique rows:
dt[, .SD[which.min(x)], by=a]
Or alternatively:
setkeyv(dt, c("a","x"))
dt[unique(dt[,a]), mult="first"]
Since you want to have all ties:
dt[,.SD[x==min(x)], by=a]
You could also use:
setkeyv(dt,c("a","x"))
dt[dt[unique(dt[,a]), mult="first"]]
Which could be more efficient if you have very big groups.
Here you go
R) dt <- data.table(a=c("A","A","B","B","B"),x=c(1,2,3,3,4))
R) dt[dt[,list(IDX=.I[x==min(x)]),by=a]$IDX]
a x
1: A 1
2: B 3
3: B 3
That should be quicker if you want ties (as I understood you wanted)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.