简体   繁体   English

计算两个数据框中日期之间的差异

[英]Calculate the difference between dates in two data frames

The below is the first data frame下面是第一个数据框

Account reference number    Amount  Date
A   1   1583.51 16/05/2016
B   2   4038.18 27/09/2016
C   3   1161.36 20/05/2016
C   4   732.39  24/10/2016
C   5   747.69  24/11/2016

The below is second data frame下面是第二个数据框

Account reference number    Amount  Date
A   6   3062.88 03/05/2016
A   7   2619.09 03/05/2016
A   8   4743.22 09/05/2016
B   9   115.28  03/05/2016
B   10  993.14  03/05/2016
B   11  879.05  03/05/2016
C   12  50.93   03/05/2016
C   13  21.83   03/05/2016
C   14  14.55   03/05/2016

I want to find the date difference for each account by comparing two data frames.我想通过比较两个数据框来找出每个帐户的日期差异。 For example, if you compare the dates in data frames for account 'A' it should be -13 days as start date would be 16/05/2016 and the stop date would be 03/05/2016.例如,如果您比较帐户“A”的数据框中的日期,则应为 -13 天,因为开始日期为 16/05/2016,停止日期为 03/05/2016。

I want the date in first data frame to be checked with every date in second data frame for that account.我希望在第一个数据框中的日期与该帐户的第二个数据框中的每个日期进行检查。 For example, 16/05/2016 should check with 03/05/2016 and 09/05/2016.例如,16/05/2016 应与 03/05/2016 和 09/05/2016 核对。

Created my own sample data, since yours is hard to copy.创建了我自己的示例数据,因为您的数据很难复制。 A solution based on dplyr:基于dplyr的解决方案:

df1 = data.frame(account=c(1,2,3,4),date=seq(Sys.Date(),Sys.Date()+3,by=1),value = c(1,1,1,1))
df2 = data.frame(account=c(1,2,3,4),date=seq(Sys.Date()+2,Sys.Date()+5,by=1), value = c(2,2,2,2))

require(dplyr)

df2 = df2 %>% select(account,df2.date=date)
df1 = df1 %>% left_join(df2) %>% mutate(diff = as.numeric(date-df2.date))

INPUT输入

> df1
  account       date value
1       1 2017-07-17     1
2       2 2017-07-18     1
3       3 2017-07-19     1
4       4 2017-07-20     1
> df2
  account       date value
1       1 2017-07-19     2
2       2 2017-07-20     2
3       3 2017-07-21     2
4       4 2017-07-22     2

OUTPUT输出

> df1
  account       date value   df2.date diff
1       1 2017-07-17     1 2017-07-19   -2
2       2 2017-07-18     1 2017-07-20   -2
3       3 2017-07-19     1 2017-07-21   -2
4       4 2017-07-20     1 2017-07-22   -2

Hope this helps!希望这可以帮助!

For simplicity I suppose that the first date frame is called a and the second b.为简单起见,我假设第一个日期框架称为 a,第二个日期框架称为 b。 I've created them in an abbreviated form我以缩写形式创建了它们

a <- data.frame(Account = c("A,B"), reference_number = c(1,2), Amount = c(1583.51,4038.18),  Date = c("16/05/2016","27/09/2016"))
b <- data.frame(Account = c("A,A"), reference_number = c(6,7), Amount = c(3062.88,2619.09),  Date = c("03/05/2016","03/05/2016"))

You can find differences between 2 dates in this way:您可以通过这种方式找到 2 个日期之间的差异:

#days
difftime(strptime(b$Date[1], format = "%d/%m/%Y"),
     strptime(a$Date[1], format = "%d/%m/%Y"),units="days")

#weeks
difftime(strptime(b$Date[1], format = "%d/%m/%Y"),
     strptime(a$Date[1], format = "%d/%m/%Y"),units="weeks")

Using sample data based on Florian 's answer:使用基于Florian的回答的示例数据:

df1 = data.frame(account=c("A","A","B","B"),date=seq(Sys.Date(),Sys.Date()+3,by=1),value = c(1,1,1,1))
df2 = data.frame(account=c("A","A","A","B"),date=seq(Sys.Date()+2,Sys.Date()+5,by=1),value = c(2,2,2,2))

I added in several instances of each account in each data frame.我在每个数据框中添加了每个account的几个实例。 This is important for obtaining correct output for your own data:这对于为您自己的数据获得正确的输出很重要:

library(dplyr)
library(lubridate)
full_join(df1,df2,by="account") %>%
  mutate(diff=date.x-date.y) %>%

  account     date.x value.x     date.y value.y    diff
1       A 2017-07-17       1 2017-07-19       2 -2 days
2       A 2017-07-17       1 2017-07-20       2 -3 days
3       A 2017-07-17       1 2017-07-21       2 -4 days
4       A 2017-07-18       1 2017-07-19       2 -1 days
5       A 2017-07-18       1 2017-07-20       2 -2 days
6       A 2017-07-18       1 2017-07-21       2 -3 days
7       B 2017-07-19       1 2017-07-22       2 -3 days
8       B 2017-07-20       1 2017-07-22       2 -2 days 

You can use plyr and dplyr package to get your desired output. 您可以使用plyrdplyr软件包来获得所需的输出。 It first sorts the combined data frame, then calculates time difference between first date and date in for each row within each group. 它首先对组合的数据帧进行排序,然后为每个组中的每一行计算第一个日期和第一个日期之间的时间差。 After that it finds the maximum for each group and at the end gets rid of added column. 之后,它会找到每个组的最大值,最后删除添加的列。

df <- rbind(df1,df2)
df$Date <- as.Date(df$Date, "%d/%m/%Y")

library(dplyr)

df <- df %>% 
         arrange(Account, Date)

library(plyr)


plyr::ddply((df), .(Account), transform, 
      Date_1 = Date[1],
      change = abs((Date - Date[1]))) %>% 
             dplyr::group_by(Account) %>% 
             dplyr::slice(which.max(change)) %>%
             dplyr::select(-Date_1)


# Source: local data frame [3 x 5] 
# Groups: Account [3] 
#  
# # A tibble: 3 x 5 
#   Account reference.number  Amount       Date   change 
#    <fctr>            <int>   <dbl>     <date>   <time> 
# 1       A                1 1583.51 2016-05-16  13 days 
# 2       B                2 4038.18 2016-09-27 147 days 
# 3       C                5  747.69 2016-11-24 205 days

Data 数据

df1 <- structure(list(Account = structure(c(1L, 2L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor"), reference.number = 1:5, Amount = c(1583.51, 
4038.18, 1161.36, 732.39, 747.69), Date = structure(c(1L, 5L, 
2L, 3L, 4L), .Label = c("16/05/2016", "20/05/2016", "24/10/2016", 
"24/11/2016", "27/09/2016"), class = "factor")), .Names = c("Account", 
"reference.number", "Amount", "Date"), class = "data.frame", row.names = c(NA,-5L))

df2 <- structure(list(Account = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), reference.number = 6:14, 
Amount = c(3062.88, 2619.09, 4743.22, 115.28, 993.14, 879.05, 
50.93, 21.83, 14.55), Date = structure(c(1L, 1L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("03/05/2016", "09/05/2016"
), class = "factor")), .Names = c("Account", "reference.number", 
"Amount", "Date"), class = "data.frame", row.names = c(NA, -9L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM