简体   繁体   中英

Limit column in data.frame over condition

I have an issue which I am not able figure out how to solve it. I am looking for a solution to split the rows on a specific maximum that might differ per type.

The real data is more complicated than the testdata and originally the max-column is imported and combined using the dplyr LEFT_JOIN-function.

test <- data.frame(datetime = c("24/9/2020", "24/9/2020", "25/9/2020"),
                   type = c(1, 2, 3),
                   units = c(5, 8, 12),
                   max = c(6, 6, 4))

preferred <- data.frame(datetime = c("24/9/2020", "24/9/2020", "24/9/2020", "25/9/2020", "25/9/2020", "25/9/2020"),
                        type = c(1, 2, 2, 3, 3, 3),
                        units = c(5, 6, 2, 4, 4, 4),
                        max = c(6, 6, 6, 4, 4, 4))

I have tried different methods but was not able to solve it without using both while and for loops. I am sure it is possible to get it quicker and easier using specific functions, but I am not able to figure out.

Is there a way to get it to the preferred output:

> test
   datetime type units max
1 24/9/2020    1     5   6
2 24/9/2020    2     8   6
3 25/9/2020    3    12   4
> preferred
   datetime type units max
1 24/9/2020    1     5   6
2 24/9/2020    2     6   6
3 24/9/2020    2     2   6
4 25/9/2020    3     4   4
5 25/9/2020    3     4   4
6 25/9/2020    3     4   4

If you have any input, please let me know.

Thank you in advance!!! Much appreciated

You can use the old split-apply-bind approach, like this:

do.call(rbind, lapply(split(test, test$type), function(x) {
    if(x$units > x$max)
    {
      x <- rbind(x[rep(1, x$units %/% x$max), ], x)
      x$units[-nrow(x)] <- x$max[1]
      x$units[nrow(x)]  <- x$units[nrow(x)] %% x$max[1]
    }
    x[x$units != 0,]
  }))
#>        datetime type units max
#> 1     24/9/2020    1     5   6
#> 2.2   24/9/2020    2     6   6
#> 2.21  24/9/2020    2     2   6
#> 3.3   25/9/2020    3     4   4
#> 3.3.1 25/9/2020    3     4   4
#> 3.3.2 25/9/2020    3     4   4

Here's an approach building a utility function for calculating a single result, and then using dplyr , purrr , and tidyr to do it by group in your data frame:

additive_components = function(max, units) {
  result = rep(max, units %/% max)
  if(units %% max != 0) result = c(result, units %% max)
  return(result)
}

library(dplyr)
library(purrr)
test %>% group_by(type) %>%
  mutate(new_units = map2(max, units, additive_components)) %>%
  unnest(new_units)
# # A tibble: 6 x 5
# # Groups:   type [3]
#   datetime   type units   max new_units
#   <chr>     <dbl> <dbl> <dbl>     <dbl>
# 1 24/9/2020     1     5     6         5
# 2 24/9/2020     2     8     6         6
# 3 24/9/2020     2     8     6         2
# 4 25/9/2020     3    12     4         4
# 5 25/9/2020     3    12     4         4
# 6 25/9/2020     3    12     4         4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM