What I'm trying to achieve in R is the following: given a table (data frame in my case) - I want to be get the lowest price for each unique combination of two columns.
For example, given the following table:
+-----+-----------+-------+----------+----------+
| Key | Feature1 | Price | Feature2 | Feature3 |
+-----+-----------+-------+----------+----------+
| AAA | 1 | 100 | whatever | whatever |
| AAA | 1 | 150 | whatever | whatever |
| AAA | 1 | 200 | whatever | whatever |
| AAA | 2 | 110 | whatever | whatever |
| AAA | 2 | 120 | whatever | whatever |
| BBB | 1 | 100 | whatever | whatever |
+-----+-----------+-------+----------+----------+
I want a result that looks like:
+-----+-----------+-------+----------+----------+
| Key | Feature1 | Price | Feature2 | Feature3 |
+-----+-----------+-------+----------+----------+
| AAA | 1 | 100 | whatever | whatever |
| AAA | 2 | 110 | whatever | whatever |
| BBB | 1 | 100 | whatever | whatever |
+-----+-----------+-------+----------+----------+
So I'm working on a solution along the lines of:
s <- lapply(split(data, list(data$Key, data$Feature1)), function(chunk) {
chunk[which.min(chunk$Price),]})
But the result is a 1 xn matrix - so I need to unsplit
the result. Also - it seems very slow. How can I improve this logic? I've seen solutions pointing in the directions of the data.table
package. Should I re-write using that package?
Update
Great answers guys - thanks! However - my original dataframe contains more columns ( Feature2 ... ) and I need them all back after the filtering. The rows that do not have the lowest price ( for the combination of Key/Feature1 ) can be discarded, so I'm not interested in their values for Feature2 / Feature3
You can use the dplyr
package:
library(dplyr)
data %>% group_by(Key, Feature1) %>%
slice(which.min(Price))
Since you referred to data.table
package, I provide here the solution using that package:
library(data.table)
setDT(df)[,.(Price=min(Price)),.(Key, Feature1)] #initial question
setDT(df)[,.SD[which.min(Price)],.(Key, Feature1)] #updated question
df is your sample data.frame.
Update: Test using mtcars
data
df<-mtcars
library(data.table)
setDT(df)[,.SD[which.min(mpg)],by=am]
am mpg cyl disp hp drat wt qsec vs gear carb
1: 1 15.0 8 301 335 3.54 3.57 14.60 0 5 8
2: 0 10.4 8 472 205 2.93 5.25 17.98 0 3 4
基础R解决方案将是aggregate(Price ~ Key + Feature1, data, FUN = min)
Using R base aggregate
> aggregate(Price~Key+Feature1, min, data=data)
Key Feature1 Price
1 AAA 1 100
2 BBB 1 100
3 AAA 2 110
See this post for other alternatives.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.