I missing something in my understanding how group_by is working in tidyverse. Example will clarify:
I have created following function, it takes few arguments and calculates optimal weights inside tibble (probably not prettiest but seems to work):
library(lpSolveAPI)
library(tidyverse)
weights_fun <- function(data_tbl, objective, constraint){
cols <- c("objective", "constraint")
linear.dt <- data_tbl %>% select_(.dots = cols)
lp.mod <- make.lp(0, NROW(linear.dt))
set.objfn(lp.mod, linear.dt$amount)
lp.control(lp.mod,sense="max")
add.constraint(lp.mod, linear.dt$duration, "=", 6)
add.constraint(lp.mod, rep(1, nrow(linear.dt)), "=", 1)
set.bounds(lp.mod, upper = rep(0.4, nrow(linear.dt)))
set.bounds(lp.mod, lower = rep(0.10, nrow(linear.dt)))
solve(lp.mod)
weights <- round(get.variables(lp.mod), 4)
return(weights)
}
this functions works nicely when I have just one group in tibble. My way of creating functions is that try to get it work by testing on one and hopefully it will work when I slice data later.
weights_fun(one_group, "amount", "duration")
one_group$weights <- weights_fun(one_group, "amount", "duration")
# A tibble: 5 x 6
date country bucket amount duration weights
<date> <chr> <chr> <dbl> <dbl> <dbl>
1 2006-01-31 AT B1 4844.500 1.48475 0.1000
2 2006-01-31 AT B2 8601.000 3.67500 0.1911
3 2006-01-31 AT B3 8518.400 5.39900 0.4000
4 2006-01-31 AT B4 6469.550 6.99950 0.1000
5 2006-01-31 AT B5 7804.533 10.96133 0.2089
Then I hoped that I could use mutate to create new column of weights to my multiple groups as following, but I get error:
three_groups %>%
group_by(date, country) %>%
mutate(weights = weights_fun(., "amount", "duration"))
Adding missing grouping variables: `date`, `country`
Error in mutate_impl(.data, dots) :
Column `weights` must be length 5 (the group size) or one, not 15
So what am I missing? Why is my function returning 15 and not 5 for each group?
DATA:
one_group <- structure(list(date = structure(c(13179, 13179, 13179, 13179,
13179), class = "Date"), country = c("AT", "AT", "AT", "AT",
"AT"), bucket = c("B1", "B2", "B3", "B4", "B5"), amount = c(4844.5,
8601, 8518.4, 6469.55, 7804.53333333333), duration = c(1.48475,
3.675, 5.399, 6.9995, 10.9613333333333)), .Names = c("date",
"country", "bucket", "amount", "duration"), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
three_groups <- structure(list(date = structure(c(13179, 13179, 13179, 13179,
13179, 13179, 13179, 13179, 13179, 13179, 13179, 13179, 13179,
13179, 13179), class = "Date"), country = c("AT", "AT", "AT",
"AT", "AT", "AU", "AU", "AU", "AU", "AU", "BE", "BE", "BE", "BE",
"BE"), bucket = c("B1", "B2", "B3", "B4", "B5", "B1", "B2", "B3",
"B4", "B5", "B1", "B2", "B3", "B4", "B5"), amount = c(4844.5,
8601, 8518.4, 6469.55, 7804.53333333333, 4650.4, 5355.25, 5796.7,
4899.25, 4995, 10151.38, 14484.8666666667, 9910.06666666667,
10507.35, 9644.2), duration = c(1.48475, 3.675, 5.399, 6.9995,
10.9613333333333, 1.8655, 3.493, 4.552, 6.3235, 7.884, 1.8558,
3.55, 5.32466666666667, 7.01975, 12.6736666666667)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L), .Names = c("date",
"country", "bucket", "amount", "duration"))
EXTRA: as Jimbou showed, grouping is working but my function is somehow broken. Hard coding some variables will get this working, so I just need to figure out correct variable reference on those?
weights_fun1 <- function(objective, constraint){
lp.mod <- make.lp(0, 5)
set.objfn(lp.mod, objective)
lp.control(lp.mod,sense="max")
add.constraint(lp.mod, constraint, "=", 6)
add.constraint(lp.mod, rep(1, 5), "=", 1)
set.bounds(lp.mod, upper = rep(0.4, 5))
set.bounds(lp.mod, lower = rep(0.10, 5))
solve(lp.mod)
weights <- round(get.variables(lp.mod), 4)
return(weights)
}
three_groups %>%
group_by(date, country) %>%
mutate(weights = weights_fun1(amount, duration))
# A tibble: 15 x 6
# Groups: date, country [3]
date country bucket amount duration weights
<date> <chr> <chr> <dbl> <dbl> <dbl>
1 2006-01-31 AT B1 4844.500 1.484750 0.1000
2 2006-01-31 AT B2 8601.000 3.675000 0.1911
3 2006-01-31 AT B3 8518.400 5.399000 0.4000
4 2006-01-31 AT B4 6469.550 6.999500 0.1000
5 2006-01-31 AT B5 7804.533 10.961333 0.2089
6 2006-01-31 AU B1 4650.400 1.865500 0.1000
7 2006-01-31 AU B2 5355.250 3.493000 0.1000
8 2006-01-31 AU B3 5796.700 4.552000 0.1235
9 2006-01-31 AU B4 4899.250 6.323500 0.2765
10 2006-01-31 AU B5 4995.000 7.884000 0.4000
11 2006-01-31 BE B1 10151.380 1.855800 0.1000
12 2006-01-31 BE B2 14484.867 3.550000 0.4000
13 2006-01-31 BE B3 9910.067 5.324667 0.1000
14 2006-01-31 BE B4 10507.350 7.019750 0.2136
15 2006-01-31 BE B5 9644.200 12.673667 0.1864
Will answer to my own question, but this seems to be work around and just poor tidyverse knowledge from my part. Thanks to Jimbou. Better answers are welcome.
modified function:
weights_fun1 <- function(objective, constraint, rows){
lp.mod <- make.lp(0, rows[1])
set.objfn(lp.mod, objective)
lp.control(lp.mod,sense="max")
add.constraint(lp.mod, constraint, "=", 6)
add.constraint(lp.mod, rep(1, rows[1]), "=", 1)
set.bounds(lp.mod, upper = rep(0.4, rows[1]))
set.bounds(lp.mod, lower = rep(0.10, rows[1]))
solve(lp.mod)
weights <- round(get.variables(lp.mod), 4)
return(weights)
}
three_groups %>%
group_by(date, country) %>%
mutate(rows = n()) %>% #create helper column, as couldn't figure out other way now
mutate(weights = weights_fun1(amount, duration, rows))
# A tibble: 15 x 7
# Groups: date, country [3]
date country bucket amount duration rows weights
<date> <chr> <chr> <dbl> <dbl> <int> <dbl>
1 2006-01-31 AT B1 4844.500 1.484750 5 0.1000
2 2006-01-31 AT B2 8601.000 3.675000 5 0.1911
3 2006-01-31 AT B3 8518.400 5.399000 5 0.4000
4 2006-01-31 AT B4 6469.550 6.999500 5 0.1000
5 2006-01-31 AT B5 7804.533 10.961333 5 0.2089
6 2006-01-31 AU B1 4650.400 1.865500 5 0.1000
7 2006-01-31 AU B2 5355.250 3.493000 5 0.1000
8 2006-01-31 AU B3 5796.700 4.552000 5 0.1235
9 2006-01-31 AU B4 4899.250 6.323500 5 0.2765
10 2006-01-31 AU B5 4995.000 7.884000 5 0.4000
11 2006-01-31 BE B1 10151.380 1.855800 5 0.1000
12 2006-01-31 BE B2 14484.867 3.550000 5 0.4000
13 2006-01-31 BE B3 9910.067 5.324667 5 0.1000
14 2006-01-31 BE B4 10507.350 7.019750 5 0.2136
15 2006-01-31 BE B5 9644.200 12.673667 5 0.1864
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.