简体   繁体   中英

Start from a specific column and traverse upto a specific number of columns based on conditions in R

I have the below data in wide format where each row represents a showroom, Quarter is from which quarter the showroom started selling and Starting Year is the Financial Year of Start.

Code    Quarter StartingYear Quarter1_Num.FY16-17 Quarter2_Num.FY16-17 Quarter3_Num.FY16-17 Quarter4_Num.FY16-17 Quarter1_Num.FY17-18 Quarter2_Num.FY17-18 Quarter3_Num.FY17-18 Quarter4_Num.FY17-18 
S2249       2   FY16-17         0                       23                  0                   0                   2                       0                   6                   0
S463        3   FY17-18         0                       0                   4                   0                   0                       4                   90                  8                                                                               

For each agent, I have to start from the column based on Quarter & Starting Year (Quarter2_Num.FY16-17 for row1) and cover a period of a year which in this case would mean Quarter2_Num.FY17-18. As can be seen the column names are based on the Quarter and StartingYear.

Ouput I am trying to get:

Code    Quarter1_Starting_Num Quarter2_Starting_Num Quarter3_Starting_Num Quarter4_Starting_Num Quarter5_Starting_Num
S2249       23                  0                       0                   2                       0
S463        4                   0                       0                   4                       90  

The columns capture data for a year across the quarters after the showroom started.

I know that using gsub I can get the columns containing FY16-17 or FY17-18. But I am not sure how to specify the starting column for each row and then traversal for N rows.

Can anyone please help me with this?

First, we transfer the data set from wide to long then do our calculations and filters finally transform it back to wide format.

library(dplyr)
library(tidyr)
gather(df, k,val,-c(Code,Quarter,StartingYear)) %>% 
mutate(Quar=gsub('Quarter(\\d)_.*','\\1',k),year=gsub('Quarter\\d_Num\\.(.*)\\.(.*)','\\1-\\2',k)) %>% 
arrange(Code) %>% group_by(Code) %>% 
mutate(flag=cumsum(cumsum(Quarter==Quar & StartingYear==year)), Quarter1=paste0('Quarter',flag,'_Starting_Num')) %>% 
filter(between(flag,1,5)) %>% select(Code,Quarter1,val) %>% spread(Quarter1,val)

# A tibble: 2 x 6
# Groups:   Code [2]
   Code  Quarter1_Starting_Num Quarter2_Starting_Num Quarter3_Starting_Num Quarter4_Starting_Num Quarter5_Starting_Num
  <fct>                 <int>                 <int>                 <int>                 <int>                 <int>
1 S2249                    23                     0                     0                     2                     0
2 S463                      4                     0                     0                     4                    90

Data

df <- structure(list(Code = structure(1:2, .Label = c("S2249", "S463"
), class = "factor"), Quarter = 2:3, StartingYear = structure(c(1L, 
1L), .Label = "FY16-17", class = "factor"), Quarter1_Num.FY16.17 = c(0L, 
0L), Quarter2_Num.FY16.17 = c(23L, 0L), Quarter3_Num.FY16.17 = c(0L, 
4L), Quarter4_Num.FY16.17 = c(0L, 0L), Quarter1_Num.FY17.18 = c(2L, 
0L), Quarter2_Num.FY17.18 = c(0L, 4L), Quarter3_Num.FY17.18 = c(6L, 
90L), Quarter4_Num.FY17.18 = c(0L, 8L)), class = "data.frame", row.names = c(NA, 
-2L))

PS: I changed S463 3 FY17-18 to S463 3 FY16-17 to match the expected output, you can keep S463 3 FY17-18 but you will get NAs for Q3 to Q5

gsub('Quarter(\\d)_.*','\\1',c('Quarter1_Num.FY16.17','Quarter4_Num.FY17.18'))
[1] "1" "4"

  • 'Quarter(\\\\d)_.*' group the one digit ie 1-9 after Quarter and before _ and return that group using \\\\1

     gsub('Quarter\\\\d_Num\\\\.(.*)\\\\.(.*)','\\\\1-\\\\2',c('Quarter1_Num.FY16.17','Quarter4_Num.FY17.18')) [1] "FY16-17" "FY17-18" 

  • \\\\. skip a literal dot after Quarter followed by a digit_Num. In a regular expression, we skip special characters like . using \\\\
  • (.*) group anything after dot and before the next dot in one group ie FY16 and FY17. gsub will consider this as group 1
  • \\\\. skip a literal dot
  • (.*) group anything after dot in one group ie 17 and 18, gsub will consider this as group 2
  • \\\\1-\\\\2 return group 1 and group 2 separted by - ie FY16-17

  • The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

     
    粤ICP备18138465号  © 2020-2024 STACKOOM.COM