简体   繁体   中英

Convert monthly pay data to weekly using complete and fill in dplyr

I have data on worker pay and some workers are paid monthly and others weekly. I would like to combine the data into a panel by worker and week (of year). To do that, I need to expand the monthly rows.

The data look like:

pay_data <- tibble(worker="Jim", start=ymd("2020-1-3"), end=ymd("2020-2-2"), rate=10, hours=50, wages=rate*hours) %>% 
  mutate(f_week=week(start), l_week=week(end))  

# A tibble: 1 x 8
  worker start      end         rate hours wages f_week l_week
  <chr>  <date>     <date>     <dbl> <dbl> <dbl>  <dbl>  <dbl>
1 Jim    2020-01-03 2020-02-02    10    50   500      1      5

Is there a way to use complete, fill or any other dplyr function to get the data to look like the below?

# A tibble: 5 x 5
  worker  week  rate hours  wage
  <chr>  <int> <dbl> <dbl> <dbl>
1 Jim        1    10    50   500
2 Jim        2    10    50   500
3 Jim        3    10    50   500
4 Jim        4    10    50   500
5 Jim        5    10    50   500

(I would then of course divide the amounts to put them all in common units).

Thanks!

A tidyverse approach making use of tidyr::separate_rows may look like so. To make the data more interesting I added data for a second worker.

library(tidyverse)

tbl %>% 
  rowwise() %>% 
  mutate(weeks = paste(seq(f_week, l_week, by = 1), collapse = ", ")) %>% 
  ungroup() %>% 
  separate_rows(weeks) %>% 
  select(-ends_with("_week"), -start, -end)
#> # A tibble: 13 x 5
#>    worker  rate hours wages weeks
#>    <chr>  <int> <int> <int> <chr>
#>  1 Jim       10    50   500 1    
#>  2 Jim       10    50   500 2    
#>  3 Jim       10    50   500 3    
#>  4 Jim       10    50   500 4    
#>  5 Jim       10    50   500 5    
#>  6 John      20   100  1000 1    
#>  7 John      20   100  1000 2    
#>  8 John      20   100  1000 3    
#>  9 John      20   100  1000 4    
#> 10 John      20   100  1000 5    
#> 11 John      20   100  1000 6    
#> 12 John      20   100  1000 7    
#> 13 John      20   100  1000 8

DATA

tbl <- read.table(text="worker start      end         rate hours wages f_week l_week
1 Jim    2020-01-03 2020-02-02    10    50   500      1      5\n
2 John    2020-01-03 2020-02-02    20    100   1000      1      8", header = TRUE)
tbl
#>   worker      start        end rate hours wages f_week l_week
#> 1    Jim 2020-01-03 2020-02-02   10    50   500      1      5
#> 2   John 2020-01-03 2020-02-02   20   100  1000      1      8

Another tidyverse way would be:

library(tidyverse)

pay_data %>%
  mutate(week = map2(f_week, l_week, seq)) %>%
  unnest(week) %>%
  select(worker, rate:wages, week)

#  worker  rate hours wages  week
#  <chr>  <dbl> <dbl> <dbl> <int>
#1 Jim       10    50   500     1
#2 Jim       10    50   500     2
#3 Jim       10    50   500     3
#4 Jim       10    50   500     4
#5 Jim       10    50   500     5

Try this:

#Code
pay_data <- pay_data[rep(seq_len(nrow(pay_data)), unique(pay_data$l_week)),
         c('worker','rate','hours','wages')]
pay_data$week <- 1:nrow(pay_data)

Output:

# A tibble: 5 x 5
  worker  rate hours wages  week
  <chr>  <dbl> <dbl> <dbl> <int>
1 Jim       10    50   500     1
2 Jim       10    50   500     2
3 Jim       10    50   500     3
4 Jim       10    50   500     4
5 Jim       10    50   500     5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM