简体   繁体   中英

Calculating distance to minimum of similar cases (observations) in R

I have a dataset that describes the results of applying 3 algorithms to a number of cases .For each combination of algorithm and case , there is a result .

df = data.frame(
  c("case1", "case1", "case1", "case2", "case2", "case2"),
  c("algo1", "algo2", "algo3", "algo1", "algo2", "algo3"),
  c(10, 11, 12, 22, 23, 20)
  );
names(df) <- c("case", "algorithm", "result");
df

These algorithms aim to minimize the result value . So for each algorithm and case I want to calculate the gap to the lowest achieved result , achieved by any algorithm for that same case.

gap <- function(caseId, result) {
  filtered = subset(df, case==caseId)
  return (result - min(filtered[,'result']));
}

When I apply that function manually, I get the expected results.

gap("case1", 10)  # prints 0, since 10 is the best value for case1
gap("case1", 11)  # prints 1, since 11-10=1
gap("case1", 12)  # prints 2, since 12-10=1

gap("case2", 22)  # prints 2, since 22-20=2
gap("case2", 23)  # prints 3, since 23-20=3
gap("case2", 20)  # prints 0, since 20 is the best value for case2

However, when I want to calculate a new column across the whole dataset, I get bogus results for case2.

df$gap <- gap(df$case, df$result)
df

This produces

   case algorithm result gap
1 case1     algo1     10   0
2 case1     algo2     11   1
3 case1     algo3     12   2
4 case2     algo1     22  12
5 case2     algo2     23  13
6 case2     algo3     20  10

It seems that now the gap function is working against the overall result minimum of the whole dataframe, whereas it should just consider rows with the same case . Maybe the subset filtering in the gap function is not working properly?

Use ave to obtain minimum value for each group and subtract from result

df$result - ave(df$result, df$case, FUN = min)
#[1] 0 1 2 2 3 0

We can use dplyr

library(dplyr)
df %>%
  group_by(case) %>% 
  mutate(result = result - min(result))
# A tibble: 6 x 3
# Groups:   case [2]
#    case algorithm result
#   <fctr>    <fctr>  <dbl>
#1  case1     algo1      0
#2  case1     algo2      1
#3  case1     algo3      2
#4  case2     algo1      2
#5  case2     algo2      3
#6  case2     algo3      0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM