[英]Calculate the difference between dates in two data frames
The below is the first data frame下面是第一个数据框
Account reference number Amount Date
A 1 1583.51 16/05/2016
B 2 4038.18 27/09/2016
C 3 1161.36 20/05/2016
C 4 732.39 24/10/2016
C 5 747.69 24/11/2016
The below is second data frame下面是第二个数据框
Account reference number Amount Date
A 6 3062.88 03/05/2016
A 7 2619.09 03/05/2016
A 8 4743.22 09/05/2016
B 9 115.28 03/05/2016
B 10 993.14 03/05/2016
B 11 879.05 03/05/2016
C 12 50.93 03/05/2016
C 13 21.83 03/05/2016
C 14 14.55 03/05/2016
I want to find the date difference for each account by comparing two data frames.我想通过比较两个数据框来找出每个帐户的日期差异。 For example, if you compare the dates in data frames for account 'A' it should be -13 days as start date would be 16/05/2016 and the stop date would be 03/05/2016.
例如,如果您比较帐户“A”的数据框中的日期,则应为 -13 天,因为开始日期为 16/05/2016,停止日期为 03/05/2016。
I want the date in first data frame to be checked with every date in second data frame for that account.我希望在第一个数据框中的日期与该帐户的第二个数据框中的每个日期进行检查。 For example, 16/05/2016 should check with 03/05/2016 and 09/05/2016.
例如,16/05/2016 应与 03/05/2016 和 09/05/2016 核对。
Created my own sample data, since yours is hard to copy.创建了我自己的示例数据,因为您的数据很难复制。 A solution based on dplyr:
基于dplyr的解决方案:
df1 = data.frame(account=c(1,2,3,4),date=seq(Sys.Date(),Sys.Date()+3,by=1),value = c(1,1,1,1))
df2 = data.frame(account=c(1,2,3,4),date=seq(Sys.Date()+2,Sys.Date()+5,by=1), value = c(2,2,2,2))
require(dplyr)
df2 = df2 %>% select(account,df2.date=date)
df1 = df1 %>% left_join(df2) %>% mutate(diff = as.numeric(date-df2.date))
INPUT输入
> df1
account date value
1 1 2017-07-17 1
2 2 2017-07-18 1
3 3 2017-07-19 1
4 4 2017-07-20 1
> df2
account date value
1 1 2017-07-19 2
2 2 2017-07-20 2
3 3 2017-07-21 2
4 4 2017-07-22 2
OUTPUT输出
> df1
account date value df2.date diff
1 1 2017-07-17 1 2017-07-19 -2
2 2 2017-07-18 1 2017-07-20 -2
3 3 2017-07-19 1 2017-07-21 -2
4 4 2017-07-20 1 2017-07-22 -2
Hope this helps!希望这可以帮助!
For simplicity I suppose that the first date frame is called a and the second b.为简单起见,我假设第一个日期框架称为 a,第二个日期框架称为 b。 I've created them in an abbreviated form
我以缩写形式创建了它们
a <- data.frame(Account = c("A,B"), reference_number = c(1,2), Amount = c(1583.51,4038.18), Date = c("16/05/2016","27/09/2016"))
b <- data.frame(Account = c("A,A"), reference_number = c(6,7), Amount = c(3062.88,2619.09), Date = c("03/05/2016","03/05/2016"))
You can find differences between 2 dates in this way:您可以通过这种方式找到 2 个日期之间的差异:
#days
difftime(strptime(b$Date[1], format = "%d/%m/%Y"),
strptime(a$Date[1], format = "%d/%m/%Y"),units="days")
#weeks
difftime(strptime(b$Date[1], format = "%d/%m/%Y"),
strptime(a$Date[1], format = "%d/%m/%Y"),units="weeks")
Using sample data based on Florian
's answer:使用基于
Florian
的回答的示例数据:
df1 = data.frame(account=c("A","A","B","B"),date=seq(Sys.Date(),Sys.Date()+3,by=1),value = c(1,1,1,1))
df2 = data.frame(account=c("A","A","A","B"),date=seq(Sys.Date()+2,Sys.Date()+5,by=1),value = c(2,2,2,2))
I added in several instances of each account
in each data frame.我在每个数据框中添加了每个
account
的几个实例。 This is important for obtaining correct output for your own data:这对于为您自己的数据获得正确的输出很重要:
library(dplyr)
library(lubridate)
full_join(df1,df2,by="account") %>%
mutate(diff=date.x-date.y) %>%
account date.x value.x date.y value.y diff
1 A 2017-07-17 1 2017-07-19 2 -2 days
2 A 2017-07-17 1 2017-07-20 2 -3 days
3 A 2017-07-17 1 2017-07-21 2 -4 days
4 A 2017-07-18 1 2017-07-19 2 -1 days
5 A 2017-07-18 1 2017-07-20 2 -2 days
6 A 2017-07-18 1 2017-07-21 2 -3 days
7 B 2017-07-19 1 2017-07-22 2 -3 days
8 B 2017-07-20 1 2017-07-22 2 -2 days
You can use plyr
and dplyr
package to get your desired output. 您可以使用
plyr
和dplyr
软件包来获得所需的输出。 It first sorts the combined data frame, then calculates time difference between first date and date in for each row within each group. 它首先对组合的数据帧进行排序,然后为每个组中的每一行计算第一个日期和第一个日期之间的时间差。 After that it finds the maximum for each group and at the end gets rid of added column.
之后,它会找到每个组的最大值,最后删除添加的列。
df <- rbind(df1,df2)
df$Date <- as.Date(df$Date, "%d/%m/%Y")
library(dplyr)
df <- df %>%
arrange(Account, Date)
library(plyr)
plyr::ddply((df), .(Account), transform,
Date_1 = Date[1],
change = abs((Date - Date[1]))) %>%
dplyr::group_by(Account) %>%
dplyr::slice(which.max(change)) %>%
dplyr::select(-Date_1)
# Source: local data frame [3 x 5]
# Groups: Account [3]
#
# # A tibble: 3 x 5
# Account reference.number Amount Date change
# <fctr> <int> <dbl> <date> <time>
# 1 A 1 1583.51 2016-05-16 13 days
# 2 B 2 4038.18 2016-09-27 147 days
# 3 C 5 747.69 2016-11-24 205 days
Data 数据
df1 <- structure(list(Account = structure(c(1L, 2L, 3L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), reference.number = 1:5, Amount = c(1583.51,
4038.18, 1161.36, 732.39, 747.69), Date = structure(c(1L, 5L,
2L, 3L, 4L), .Label = c("16/05/2016", "20/05/2016", "24/10/2016",
"24/11/2016", "27/09/2016"), class = "factor")), .Names = c("Account",
"reference.number", "Amount", "Date"), class = "data.frame", row.names = c(NA,-5L))
df2 <- structure(list(Account = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), reference.number = 6:14,
Amount = c(3062.88, 2619.09, 4743.22, 115.28, 993.14, 879.05,
50.93, 21.83, 14.55), Date = structure(c(1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("03/05/2016", "09/05/2016"
), class = "factor")), .Names = c("Account", "reference.number",
"Amount", "Date"), class = "data.frame", row.names = c(NA, -9L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.