简体   繁体   中英

custom functions with group_by tidyverse

I missing something in my understanding how group_by is working in tidyverse. Example will clarify:

I have created following function, it takes few arguments and calculates optimal weights inside tibble (probably not prettiest but seems to work):

library(lpSolveAPI)
library(tidyverse)

weights_fun <- function(data_tbl, objective, constraint){
  cols <- c("objective", "constraint")
  linear.dt <- data_tbl %>% select_(.dots = cols)
  lp.mod <- make.lp(0, NROW(linear.dt))
  set.objfn(lp.mod, linear.dt$amount)
  lp.control(lp.mod,sense="max")
  add.constraint(lp.mod, linear.dt$duration, "=", 6)
  add.constraint(lp.mod, rep(1, nrow(linear.dt)), "=", 1)
  set.bounds(lp.mod, upper = rep(0.4, nrow(linear.dt)))
  set.bounds(lp.mod, lower = rep(0.10, nrow(linear.dt)))
  solve(lp.mod)
  weights <- round(get.variables(lp.mod), 4)
  return(weights)
}

this functions works nicely when I have just one group in tibble. My way of creating functions is that try to get it work by testing on one and hopefully it will work when I slice data later.

weights_fun(one_group, "amount", "duration")
one_group$weights <- weights_fun(one_group, "amount", "duration")


  # A tibble: 5 x 6
        date country bucket   amount duration weights
      <date>   <chr>  <chr>    <dbl>    <dbl>   <dbl>
1 2006-01-31      AT     B1 4844.500  1.48475  0.1000
2 2006-01-31      AT     B2 8601.000  3.67500  0.1911
3 2006-01-31      AT     B3 8518.400  5.39900  0.4000
4 2006-01-31      AT     B4 6469.550  6.99950  0.1000
5 2006-01-31      AT     B5 7804.533 10.96133  0.2089

Then I hoped that I could use mutate to create new column of weights to my multiple groups as following, but I get error:

three_groups %>% 
  group_by(date, country) %>% 
  mutate(weights = weights_fun(., "amount", "duration"))

Adding missing grouping variables: `date`, `country`
Error in mutate_impl(.data, dots) : 
  Column `weights` must be length 5 (the group size) or one, not 15

So what am I missing? Why is my function returning 15 and not 5 for each group?

DATA:

one_group <- structure(list(date = structure(c(13179, 13179, 13179, 13179, 
13179), class = "Date"), country = c("AT", "AT", "AT", "AT", 
"AT"), bucket = c("B1", "B2", "B3", "B4", "B5"), amount = c(4844.5, 
8601, 8518.4, 6469.55, 7804.53333333333), duration = c(1.48475, 
3.675, 5.399, 6.9995, 10.9613333333333)), .Names = c("date", 
"country", "bucket", "amount", "duration"), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

three_groups <- structure(list(date = structure(c(13179, 13179, 13179, 13179, 
13179, 13179, 13179, 13179, 13179, 13179, 13179, 13179, 13179, 
13179, 13179), class = "Date"), country = c("AT", "AT", "AT", 
"AT", "AT", "AU", "AU", "AU", "AU", "AU", "BE", "BE", "BE", "BE", 
"BE"), bucket = c("B1", "B2", "B3", "B4", "B5", "B1", "B2", "B3", 
"B4", "B5", "B1", "B2", "B3", "B4", "B5"), amount = c(4844.5, 
8601, 8518.4, 6469.55, 7804.53333333333, 4650.4, 5355.25, 5796.7, 
4899.25, 4995, 10151.38, 14484.8666666667, 9910.06666666667, 
10507.35, 9644.2), duration = c(1.48475, 3.675, 5.399, 6.9995, 
10.9613333333333, 1.8655, 3.493, 4.552, 6.3235, 7.884, 1.8558, 
3.55, 5.32466666666667, 7.01975, 12.6736666666667)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -15L), .Names = c("date", 
"country", "bucket", "amount", "duration"))

EXTRA: as Jimbou showed, grouping is working but my function is somehow broken. Hard coding some variables will get this working, so I just need to figure out correct variable reference on those?

weights_fun1 <- function(objective, constraint){
  lp.mod <- make.lp(0, 5)
  set.objfn(lp.mod, objective)
  lp.control(lp.mod,sense="max")
  add.constraint(lp.mod, constraint, "=", 6)
  add.constraint(lp.mod, rep(1, 5), "=", 1)
  set.bounds(lp.mod, upper = rep(0.4, 5))
  set.bounds(lp.mod, lower = rep(0.10, 5))
  solve(lp.mod)
  weights <- round(get.variables(lp.mod), 4)
  return(weights)
}

three_groups %>% 
  group_by(date, country) %>% 
  mutate(weights = weights_fun1(amount, duration))

# A tibble: 15 x 6
# Groups:   date, country [3]
         date country bucket    amount  duration weights
       <date>   <chr>  <chr>     <dbl>     <dbl>   <dbl>
 1 2006-01-31      AT     B1  4844.500  1.484750  0.1000
 2 2006-01-31      AT     B2  8601.000  3.675000  0.1911
 3 2006-01-31      AT     B3  8518.400  5.399000  0.4000
 4 2006-01-31      AT     B4  6469.550  6.999500  0.1000
 5 2006-01-31      AT     B5  7804.533 10.961333  0.2089
 6 2006-01-31      AU     B1  4650.400  1.865500  0.1000
 7 2006-01-31      AU     B2  5355.250  3.493000  0.1000
 8 2006-01-31      AU     B3  5796.700  4.552000  0.1235
 9 2006-01-31      AU     B4  4899.250  6.323500  0.2765
10 2006-01-31      AU     B5  4995.000  7.884000  0.4000
11 2006-01-31      BE     B1 10151.380  1.855800  0.1000
12 2006-01-31      BE     B2 14484.867  3.550000  0.4000
13 2006-01-31      BE     B3  9910.067  5.324667  0.1000
14 2006-01-31      BE     B4 10507.350  7.019750  0.2136
15 2006-01-31      BE     B5  9644.200 12.673667  0.1864

Will answer to my own question, but this seems to be work around and just poor tidyverse knowledge from my part. Thanks to Jimbou. Better answers are welcome.

modified function:

weights_fun1 <- function(objective, constraint, rows){
  lp.mod <- make.lp(0, rows[1])
  set.objfn(lp.mod, objective)
  lp.control(lp.mod,sense="max")
  add.constraint(lp.mod, constraint, "=", 6)
  add.constraint(lp.mod, rep(1, rows[1]), "=", 1)
  set.bounds(lp.mod, upper = rep(0.4, rows[1]))
  set.bounds(lp.mod, lower = rep(0.10, rows[1]))
  solve(lp.mod)
  weights <- round(get.variables(lp.mod), 4)
  return(weights)
}

three_groups %>% 
  group_by(date, country) %>% 
  mutate(rows = n()) %>% #create helper column, as couldn't figure out other way now
  mutate(weights = weights_fun1(amount, duration, rows))


# A tibble: 15 x 7
# Groups:   date, country [3]
         date country bucket    amount  duration  rows weights
       <date>   <chr>  <chr>     <dbl>     <dbl> <int>   <dbl>
 1 2006-01-31      AT     B1  4844.500  1.484750     5  0.1000
 2 2006-01-31      AT     B2  8601.000  3.675000     5  0.1911
 3 2006-01-31      AT     B3  8518.400  5.399000     5  0.4000
 4 2006-01-31      AT     B4  6469.550  6.999500     5  0.1000
 5 2006-01-31      AT     B5  7804.533 10.961333     5  0.2089
 6 2006-01-31      AU     B1  4650.400  1.865500     5  0.1000
 7 2006-01-31      AU     B2  5355.250  3.493000     5  0.1000
 8 2006-01-31      AU     B3  5796.700  4.552000     5  0.1235
 9 2006-01-31      AU     B4  4899.250  6.323500     5  0.2765
10 2006-01-31      AU     B5  4995.000  7.884000     5  0.4000
11 2006-01-31      BE     B1 10151.380  1.855800     5  0.1000
12 2006-01-31      BE     B2 14484.867  3.550000     5  0.4000
13 2006-01-31      BE     B3  9910.067  5.324667     5  0.1000
14 2006-01-31      BE     B4 10507.350  7.019750     5  0.2136
15 2006-01-31      BE     B5  9644.200 12.673667     5  0.1864

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM