Start from a specific column and traverse upto a specific number of columns based on conditions in R

Question

I have the below data in wide format where each row represents a showroom, Quarter is from which quarter the showroom started selling and Starting Year is the Financial Year of Start.

Code    Quarter StartingYear Quarter1_Num.FY16-17 Quarter2_Num.FY16-17 Quarter3_Num.FY16-17 Quarter4_Num.FY16-17 Quarter1_Num.FY17-18 Quarter2_Num.FY17-18 Quarter3_Num.FY17-18 Quarter4_Num.FY17-18 
S2249       2   FY16-17         0                       23                  0                   0                   2                       0                   6                   0
S463        3   FY17-18         0                       0                   4                   0                   0                       4                   90                  8

For each agent, I have to start from the column based on Quarter & Starting Year (Quarter2_Num.FY16-17 for row1) and cover a period of a year which in this case would mean Quarter2_Num.FY17-18. As can be seen the column names are based on the Quarter and StartingYear.

Ouput I am trying to get:

Code    Quarter1_Starting_Num Quarter2_Starting_Num Quarter3_Starting_Num Quarter4_Starting_Num Quarter5_Starting_Num
S2249       23                  0                       0                   2                       0
S463        4                   0                       0                   4                       90

The columns capture data for a year across the quarters after the showroom started.

I know that using gsub I can get the columns containing FY16-17 or FY17-18. But I am not sure how to specify the starting column for each row and then traversal for N rows.

Can anyone please help me with this?

Answer 1

First, we transfer the data set from wide to long then do our calculations and filters finally transform it back to wide format.

library(dplyr)
library(tidyr)
gather(df, k,val,-c(Code,Quarter,StartingYear)) %>% 
mutate(Quar=gsub('Quarter(\\d)_.*','\\1',k),year=gsub('Quarter\\d_Num\\.(.*)\\.(.*)','\\1-\\2',k)) %>% 
arrange(Code) %>% group_by(Code) %>% 
mutate(flag=cumsum(cumsum(Quarter==Quar & StartingYear==year)), Quarter1=paste0('Quarter',flag,'_Starting_Num')) %>% 
filter(between(flag,1,5)) %>% select(Code,Quarter1,val) %>% spread(Quarter1,val)

# A tibble: 2 x 6
# Groups:   Code [2]
   Code  Quarter1_Starting_Num Quarter2_Starting_Num Quarter3_Starting_Num Quarter4_Starting_Num Quarter5_Starting_Num
  <fct>                 <int>                 <int>                 <int>                 <int>                 <int>
1 S2249                    23                     0                     0                     2                     0
2 S463                      4                     0                     0                     4                    90

Data

df <- structure(list(Code = structure(1:2, .Label = c("S2249", "S463"
), class = "factor"), Quarter = 2:3, StartingYear = structure(c(1L, 
1L), .Label = "FY16-17", class = "factor"), Quarter1_Num.FY16.17 = c(0L, 
0L), Quarter2_Num.FY16.17 = c(23L, 0L), Quarter3_Num.FY16.17 = c(0L, 
4L), Quarter4_Num.FY16.17 = c(0L, 0L), Quarter1_Num.FY17.18 = c(2L, 
0L), Quarter2_Num.FY17.18 = c(0L, 4L), Quarter3_Num.FY17.18 = c(6L, 
90L), Quarter4_Num.FY17.18 = c(0L, 8L)), class = "data.frame", row.names = c(NA, 
-2L))

PS: I changed S463 3 FY17-18 to S463 3 FY16-17 to match the expected output, you can keep S463 3 FY17-18 but you will get NAs for Q3 to Q5

gsub('Quarter(\\d)_.*','\\1',c('Quarter1_Num.FY16.17','Quarter4_Num.FY17.18'))
[1] "1" "4"

'Quarter(\\\\d)_.*' group the one digit ie 1-9 after Quarter and before _ and return that group using \\\\1

 gsub('Quarter\\\\d_Num\\\\.(.*)\\\\.(.*)','\\\\1-\\\\2',c('Quarter1_Num.FY16.17','Quarter4_Num.FY17.18')) [1] "FY16-17" "FY17-18"

\\\\. skip a literal dot after Quarter followed by a digit_Num. In a regular expression, we skip special characters like . using \\\\

(.*) group anything after dot and before the next dot in one group ie FY16 and FY17. gsub will consider this as group 1

\\\\. skip a literal dot

(.*) group anything after dot in one group ie 17 and 18, gsub will consider this as group 2

\\\\1-\\\\2 return group 1 and group 2 separted by - ie FY16-17

Start from a specific column and traverse upto a specific number of columns based on conditions in R

Question

1 answers

solution1
1 ACCPTED 2019-03-12 09:49:11

Start from a specific column and traverse upto a specific number of columns based on conditions in R

Question

1 answers

solution1 1 ACCPTED 2019-03-12 09:49:11

solution1
1 ACCPTED 2019-03-12 09:49:11