I have the below data in wide format where each row represents a showroom, Quarter is from which quarter the showroom started selling and Starting Year is the Financial Year of Start.
Code Quarter StartingYear Quarter1_Num.FY16-17 Quarter2_Num.FY16-17 Quarter3_Num.FY16-17 Quarter4_Num.FY16-17 Quarter1_Num.FY17-18 Quarter2_Num.FY17-18 Quarter3_Num.FY17-18 Quarter4_Num.FY17-18
S2249 2 FY16-17 0 23 0 0 2 0 6 0
S463 3 FY17-18 0 0 4 0 0 4 90 8
For each agent, I have to start from the column based on Quarter & Starting Year (Quarter2_Num.FY16-17 for row1) and cover a period of a year which in this case would mean Quarter2_Num.FY17-18. As can be seen the column names are based on the Quarter and StartingYear.
Ouput I am trying to get:
Code Quarter1_Starting_Num Quarter2_Starting_Num Quarter3_Starting_Num Quarter4_Starting_Num Quarter5_Starting_Num
S2249 23 0 0 2 0
S463 4 0 0 4 90
The columns capture data for a year across the quarters after the showroom started.
I know that using gsub I can get the columns containing FY16-17 or FY17-18. But I am not sure how to specify the starting column for each row and then traversal for N rows.
Can anyone please help me with this?
First, we transfer the data set from wide to long then do our calculations and filters finally transform it back to wide format.
library(dplyr)
library(tidyr)
gather(df, k,val,-c(Code,Quarter,StartingYear)) %>%
mutate(Quar=gsub('Quarter(\\d)_.*','\\1',k),year=gsub('Quarter\\d_Num\\.(.*)\\.(.*)','\\1-\\2',k)) %>%
arrange(Code) %>% group_by(Code) %>%
mutate(flag=cumsum(cumsum(Quarter==Quar & StartingYear==year)), Quarter1=paste0('Quarter',flag,'_Starting_Num')) %>%
filter(between(flag,1,5)) %>% select(Code,Quarter1,val) %>% spread(Quarter1,val)
# A tibble: 2 x 6
# Groups: Code [2]
Code Quarter1_Starting_Num Quarter2_Starting_Num Quarter3_Starting_Num Quarter4_Starting_Num Quarter5_Starting_Num
<fct> <int> <int> <int> <int> <int>
1 S2249 23 0 0 2 0
2 S463 4 0 0 4 90
Data
df <- structure(list(Code = structure(1:2, .Label = c("S2249", "S463"
), class = "factor"), Quarter = 2:3, StartingYear = structure(c(1L,
1L), .Label = "FY16-17", class = "factor"), Quarter1_Num.FY16.17 = c(0L,
0L), Quarter2_Num.FY16.17 = c(23L, 0L), Quarter3_Num.FY16.17 = c(0L,
4L), Quarter4_Num.FY16.17 = c(0L, 0L), Quarter1_Num.FY17.18 = c(2L,
0L), Quarter2_Num.FY17.18 = c(0L, 4L), Quarter3_Num.FY17.18 = c(6L,
90L), Quarter4_Num.FY17.18 = c(0L, 8L)), class = "data.frame", row.names = c(NA,
-2L))
PS: I changed S463 3 FY17-18
to S463 3 FY16-17
to match the expected output, you can keep S463 3 FY17-18
but you will get NAs for Q3 to Q5
gsub('Quarter(\\d)_.*','\\1',c('Quarter1_Num.FY16.17','Quarter4_Num.FY17.18'))
[1] "1" "4"
'Quarter(\\\\d)_.*'
group the one digit ie 1-9 after Quarter and before _ and return that group using \\\\1
gsub('Quarter\\\\d_Num\\\\.(.*)\\\\.(.*)','\\\\1-\\\\2',c('Quarter1_Num.FY16.17','Quarter4_Num.FY17.18')) [1] "FY16-17" "FY17-18"
\\\\.
skip a literal dot after Quarter followed by a digit_Num. In a regular expression, we skip special characters like .
using \\\\
(.*)
group anything after dot and before the next dot in one group ie FY16 and FY17. gsub
will consider this as group 1 \\\\.
skip a literal dot (.*)
group anything after dot in one group ie 17 and 18, gsub
will consider this as group 2 \\\\1-\\\\2
return group 1 and group 2 separted by -
ie FY16-17
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.