简体   繁体   中英

Group data into new column value based of condition

I have data like below:

Caller  Date    Duration    Status
304 2/1/2016    756 ANSWERED
304 2/1/2016    61  ANSWERED
304 2/4/2016    60  ANSWERED
304 2/10/2016   61  ANSWERED
304 2/17/2016   60  ANSWERED
304 2/19/2016   30  ANSWERED
304 2/24/2016   27  ANSWERED
304 2/28/2016   55  ANSWERED
304 2/28/2016   63  ANSWERED

I want to group the data in R, based on week, ie if hte date lies between 2/1/2017 and 2/7/2017 I add a new column called "week" and place the value as Week 1 for those tuples. similarly for all other weeks in month.

The output would look as such

Caller  Date    Duration    Status Week
304 2/1/2016    756 ANSWERED   Week 1
304 2/1/2016    61  ANSWERED   Week 1
304 2/4/2016    60  ANSWERED   Week 1
304 2/10/2016   61  ANSWERED   Week 2
304 2/17/2016   60  ANSWERED   Week 2
304 2/19/2016   30  ANSWERED   Week 3
304 2/24/2016   27  ANSWERED   Week 4
304 2/28/2016   55  ANSWERED   Week 4
304 2/28/2016   63  ANSWERED   Week 4

Please suggest me a method in R. thanks

One way to do this would be to use lubridate and dplyr

Suppose your data is in a data frame called dat :

library(lubridate)
library(dplyr)
dat$Date <- mdy(dat$Date)
t0 <- dat[1, 2]
dat %>% mutate(Week = paste('Week', as.integer(Date - t0) / 7) + 1)) 

Result:

Caller       Date Duration   Status   Week
1    304 2016-02-01      756 ANSWERED Week 1
2    304 2016-02-01       61 ANSWERED Week 1
3    304 2016-02-04       60 ANSWERED Week 1
4    304 2016-02-10       61 ANSWERED Week 2
5    304 2016-02-17       60 ANSWERED Week 3
6    304 2016-02-19       30 ANSWERED Week 3
7    304 2016-02-24       27 ANSWERED Week 4
8    304 2016-02-28       55 ANSWERED Week 4
9    304 2016-02-28       63 ANSWERED Week 4

You can pull the week of the year directly with

format(as.Date("2016-07-01"), format = "Week %U")

See the help for strptime for more details on the formatting. Note, for example, that it only gives week of the year -- so 2017-01-01 will be before anything in 2016. You could write a wrapper similar to @ManishGoel's answer that would set your starting point as week 1.

A more generic solution is to use cut :

mycuts <- seq(as.Date("2016-01-01"), as.Date("2017-12-30"), 7 )
cut(as.Date("2016-07-01"), mycuts, labels = 1:(length(mycuts)-1))

That may be easier to scale for your needs, and applies more broadly to other classes of problems. If you really need the "Week" in there, you can do that directly too:

cut(as.Date("2016-07-01"), mycuts, labels = paste("Week", 1:(length(mycuts)-1)))

You can extract the day using strsplit and then calculate the week from the date.

Week <- sapply(df$Date, FUN = function(x){
  day <- as.numeric(strsplit(as.character(x),"/")[[1]]2]);
  return(as.integer(day/7)+1)
})
df$Week <- Week

Though, you need to give more information regarding how the dates are distributed cause calculation of week number depends on that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM