简体   繁体   中英

Merge/Join Data Frame / Table based on criteria - > or <

I have a data frame with weekly data by Section. Each Section has approx 104 weeks worth of data and there is 83 sections in total.

I have a second data frame with the Start and End week by Section that I want to filter the main data frame on.

In both tables the Week is a combination of Year and Week eg 201501 and is always from weeks 1 to 52.

So in the example below I want to filter Section A by weeks 201401 to 201404, Section B by weeks 201551 to 201603.

I initially thought I could add an additional column to my Weeks_Filter data frame that is a sequential number from the start and end of the the weeks for each section (duplicating each row for each week), then merge the 2 tables and keep all the data from the Weeks_Filter table (all.y = TRUE) because this worked on a small sample I did but I don't know how to add the sequential weeks since they can span different years.

Week <- c("201401","201402","201403","201404","201405", "201451", "201552", "201601", "201602", "201603")
Section <- c(rep("A",5),rep("B",5))
df <- data.frame(cbind(Week, Section))

Section <- c("A", "B")
Start <- c("201401","201551")
End <- c("201404","201603")
Weeks_Filter <- data.frame(cbind(Section, Start, End))

The latest development version of data.table adds non-equi joins (and in the older ones you can use foverlaps ):

setDT(df) # convert to data.table in place
setDT(Weeks_Filter)

# fix the column types - you have factors currently, converting to integer
df[, Week := as.integer(as.character(Week))]
Weeks_Filter[, `:=`(Start = as.integer(as.character(Start)),
                    End   = as.integer(as.character(End)))]

# the actual magic
df[df[Weeks_Filter, on = .(Section, Week >= Start, Week <= End), which = T]]
#     Week Section
#1: 201401       A
#2: 201402       A
#3: 201403       A
#4: 201404       A
#5: 201552       B
#6: 201601       B
#7: 201602       B
#8: 201603       B

Using dplyr you can

  • combine your data frames
  • group by Section
  • filter based on the Start and End columns

One problem is that your 'weeks' are characters and become factors the way you've encoded them. I took the shortcut and just made them numeric, but I'd recommend using lubridate to make these proper Date class vectors.

library(dplyr)
tempdf <- full_join(df, Weeks_Filter)
tempdf$Week <- as.numeric(as.character(tempdf$Week))
tempdf$Start <- as.numeric(as.character(tempdf$Start))
tempdf$End <- as.numeric(as.character(tempdf$End))


tempdf_filt <- tempdf %>%
  group_by(Section) %>%
  filter(Week >= Start,
         Week <= End)

It looks like there's a problem in your data that "201451" should be "201551", but otherwise returns what you want:

> tempdf_filt
Source: local data frame [8 x 4]
Groups: Section [2]

    Week Section  Start    End
   (dbl)  (fctr)  (dbl)  (dbl)
1 201401       A 201401 201404
2 201402       A 201401 201404
3 201403       A 201401 201404
4 201404       A 201401 201404
5 201552       B 201551 201603
6 201601       B 201551 201603
7 201602       B 201551 201603
8 201603       B 201551 201603

Perhaps creating a vector of all desired weeks would work for the filter. Here is a rough example using base R:

# get weeks
allWeeks <- as.character(1:52)
allWeeks <- ifelse(nchar(allWeeks)==1, paste0("0",allWeeks), allWeeks)
# get all year-weeks
allWeeks <- paste0(2014:2015, allWeeks)

# filter vector to select desired weeks
keepWeeks <- keepWeeks[grep("201(40[1-4]|55[12]|60[123]))", allWeeks)]

dfKeeper <- df[df$Week %in% keepWeeks,]

I tried to construct a regular expression that would capture the periods that you want, but you may have to adjust it a bit.

require(data.table)

df <- merge(df, Weeks_Filter)
df[, -1] <- apply(df[, -1], 2, function(x) as.numeric(as.character(x)))
df <- data.table(df)

df[Week >= Start & Week <= End, .SD, by = Section]

The Output is,

   Section  Start    End   Week
1:       A 201401 201404 201401
2:       A 201401 201404 201402
3:       A 201401 201404 201403
4:       A 201401 201404 201404
5:       B 201551 201603 201552
6:       B 201551 201603 201601
7:       B 201551 201603 201602
8:       B 201551 201603 201603

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM