简体   繁体   English

如果Date超出给定间隔,则在数据框中将值设置为NA

[英]Set values to NA in data frame if Date is outside of a given interval

I have two dataframes, df1 and df2 . 我有两个数据帧, df1df2

df1 contains values for different products X1 , X2 ,and so on at different times. df1包含不同时间的不同产品X1X2值。 df2 contains the true start and end date for some of the products. df2包含某些产品的真实开始和结束日期。 I want to replace the values outside of the given date intervals in df2 by NA , as shown in the final table df3 . 我想要替换的值在给定日期的时间间隔以外df2NA ,如图最后的表df3

Create df1 and df2 : 创建df1df2

df1=data.frame(matrix(NA,10,6))
df1[,1]=(c(seq(as.Date("2012-01-01"),as.Date("2012-10-01"),by="1 month")))
df1[,2]=c(1:10); df1[,3]=c(12:21); df1[,4]=c(0.5:10); df1[,5]=c(5:14); df1[,6]=c(10:19)
colnames(df1)=c("Date","X1","X2","X3","X4","X5")
df2=data.frame(matrix(data=c("X1","X2","X4","2012-02-01","2012-04-01","2012-06-01","2012-09-01","2012-06-01","2012-10-01"),3,3))
colnames(df2)=c("Name","Start","End")

Output: 输出:

   > df1
         Date X1 X2  X3 X4 X5
1  2012-01-01  1 12 0.5  5 10
2  2012-02-01  2 13 1.5  6 11
3  2012-03-01  3 14 2.5  7 12
4  2012-04-01  4 15 3.5  8 13
5  2012-05-01  5 16 4.5  9 14
6  2012-06-01  6 17 5.5 10 15
7  2012-07-01  7 18 6.5 11 16
8  2012-08-01  8 19 7.5 12 17
9  2012-09-01  9 20 8.5 13 18
10 2012-10-01 10 21 9.5 14 19
> df2
  Name      Start        End
1   X1 2012-02-01 2012-09-01
2   X2 2012-04-01 2012-06-01
3   X4 2012-06-01 2012-10-01

Final output should look like this: 最终输出应如下所示:

 df3
       Date  X1  X2  X3 X4 X5
1  2012-01-01 NA NA 0.5 NA 10
2  2012-02-01  2 NA 1.5 NA 11
3  2012-03-01  3 NA 2.5 NA 12
4  2012-04-01  4 15 3.5 NA 13
5  2012-05-01  5 16 4.5 NA 14
6  2012-06-01  6 17 5.5 10 15
7  2012-07-01  7 NA 6.5 11 16
8  2012-08-01  8 NA 7.5 12 17
9  2012-09-01  9 NA 8.5 13 18
10 2012-10-01 NA NA 9.5 14 19

I am sure there is a more elegant way, but you could create a matrix of the indices that meet your criterion, where you set the elements to 1 if it is within your interval for that product and NA if it isn't. 我确信有一种更优雅的方式,但你可以创建一个符合你标准的索引矩阵,如果它在你的产品间隔内,你将元素设置为1如果不在,则设置NA Assuming you are dealing with numerical values you can then multiply your data frame with that index matrix: 假设您正在处理数值,那么您可以将数据框与该索引矩阵相乘:

Example: 例:

library(dplyr)
## Convert your dates to Date-objects:
df2 <- df2 %>% dplyr::mutate(Start = as.Date(Start), End = as.Date(End))

## Create a matrix of indices (TRUE/FALSE):
indMx <- lapply(names(df1)[-1], function(product){
            (df1$Date >= df2$Start[df2$Name == product]) & 
                    (df1$Date <= df2$End[df2$Name == product]) 
        }) %>% do.call('cbind',.)

## Multiply with NA^indMx, which gives you NA in place of FALSE and 
## 1 in place of TRUE:
df1[,-1] <- df1[,-1]*NA^indMx

df1
#          Date X1 X2  X3
# 1  2012-01-01  1 12 0.5
# 2  2012-02-01 NA 13 1.5
# 3  2012-03-01 NA 14 2.5
# 4  2012-04-01 NA NA 3.5
# 5  2012-05-01 NA NA 4.5
# 6  2012-06-01 NA NA  NA
# 7  2012-07-01 NA 18  NA
# 8  2012-08-01 NA 19  NA
# 9  2012-09-01 NA 20  NA
# 10 2012-10-01 10 21  NA

Here is one solution with data.table . 这是一个data.table解决方案。 There might be a more elegant method using non-equi joins. 使用非equi连接可能有更优雅的方法。

for(i in seq_len(nrow(df2))) df1[!(Date %between% df2[i,.(Start, End)]), df2[i, Name] := NA]

Here, you run through each row of df2, subset df1 based on dates outside of the start and end dates in the current row of df2, and then assign NA to the variable given in df2. 在这里,您将根据df2当前行中开始日期和结束日期之外的日期遍历df2,子集df1的每一行,然后将NA分配给df2中给出的变量。

This returns 这回来了

df1
          Date X1 X2  X3
 1: 2012-01-01 NA NA  NA
 2: 2012-02-01  2 NA  NA
 3: 2012-03-01  3 NA  NA
 4: 2012-04-01  4 15  NA
 5: 2012-05-01  5 16  NA
 6: 2012-06-01  6 17 5.5
 7: 2012-07-01  7 NA 6.5
 8: 2012-08-01  8 NA 7.5
 9: 2012-09-01  9 NA 8.5
10: 2012-10-01 NA NA 9.5

update 更新

If the data is constructed as was updated in the original post, then run this line first to convert the Names variable in df2 to a character vector (starts out as a factor). 如果数据构造为原始帖子中更新的数据,则首先运行此行以将df2中的Names变量转换为字符向量(作为因子开始)。 Then the above code will work for the new dataset. 然后上面的代码将适用于新数据集。

# convert data.frames to data.tables
setDT(df1)
setDT(df2)

# convert factor to character
df2[, Name := as.character(Name)]

data 数据

library(data.table)
# read in data
df1 <- fread("Date X1 X2  X3
2012-01-01  1 12 0.5
2012-02-01  2 13 1.5
2012-03-01  3 14 2.5
2012-04-01  4 15 3.5
2012-05-01  5 16 4.5
2012-06-01  6 17 5.5
2012-07-01  7 18 6.5
2012-08-01  8 19 7.5
2012-09-01  9 20 8.5
2012-10-01 10 21 9.5")

df2 <- fread("  Name      Start        End
X1 2012-02-01 2012-09-01
X2 2012-04-01 2012-06-01
X3 2012-06-01 2012-10-01")

# convert to date type
df1[, Date := as.Date(Date)]
df2[, c("Start", "End")  := .(as.Date(Start), as.Date(End))]

Using dplyr and tidyr ... 使用dplyrtidyr ......

library(tidyr)
library(dplyr)

df3 <- df1 %>% gather(key=Name,value=value,-Date) %>% #convert to long form
  left_join(df2) %>% #merge in date limits
  mutate(ind=(as.Date(Date)>=as.Date(Start) & as.Date(Date)<=as.Date(End))) %>% #check valid 
  mutate(value=replace(value,!ind,NA)) %>% #replace invalid with NA
  select(Date,Name,value) %>% #remove unnecessary variables
  spread(key=Name,value=value) #convert back to rectangular form

df3
         Date X1 X2  X3 X4 X5
1  2012-01-01 NA NA 0.5 NA 10
2  2012-02-01  2 NA 1.5 NA 11
3  2012-03-01  3 NA 2.5 NA 12
4  2012-04-01  4 15 3.5 NA 13
5  2012-05-01  5 16 4.5 NA 14
6  2012-06-01  6 17 5.5 10 15
7  2012-07-01  7 NA 6.5 11 16
8  2012-08-01  8 NA 7.5 12 17
9  2012-09-01  9 NA 8.5 13 18
10 2012-10-01 NA NA 9.5 14 19

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM