简体   繁体   中英

Using two dataframes and their columns in mutate or other dplyr functions

I am trying to determine the difference between two dates, but from separate dataframes in R. This is one dataframe lets say d1

id      date        value        
2222    11/1/12     22.65     
2222    11/2/12     23.11     
20100   10/30/12    35.21       
20100   11/2/12     38.97     
20103   10/30/12    57.98     
20103   10/31/12    60.83     

This is another lets say d2

id      date        value
2222    10/30/12    21.01      
2222    10/31/12    22.04                 
20100   10/31/12    37.07      
20100   11/1/12     38.17           
20103   10/29/12    57.98      
20103   10/16/12    60.83 

My expected output would be

   Datediff
    2 day
    2 day          
    -1 day
    1 day     
    1 day
    15 day 

I tried using on d1 in the mutate argument and then directly call the column for date from d2

data_RN<-d1 %>% group_by(id) %>% mutate(datediff= d1$date-d2$date)

I am also getting the error:

Error: Column datediff must be length 201 (the group size) or one, not 1000 In addition: Warning message: In Ops.factor(Call_date, df2$date) : '-' not meaningful for factors

Edit:

I would also like to know how I would find the difference in datetime in minutes

I think that the problem lies in group_by(id) . Delete this element and so you get what you want:

library(tidyverse)

df1<-tribble(~id     ,~ date   ,~     value ,       
         2222  ,  "11/1/12"   ,  22.65  ,   
         2222   , "11/2/12"  ,   23.11  ,   
         20100  , "10/30/12" ,   35.21    ,   
         20100 ,  "11/2/12"   ,  38.97  ,   
         20103 ,  "10/30/12"  ,  57.98  ,   
         20103 ,  "10/31/12" ,   60.83    )

df2<-tribble(~id   ,~   date     ,~   value,
         2222 ,   "10/30/12"  ,  21.01  ,    
         2222 ,   "10/31/12" ,   22.04 ,                
         20100  , "10/31/12"  ,  37.07  ,    
         20100,   "11/1/12"  ,   38.17 ,          
         20103 ,  "10/29/12"   , 57.98 ,     
         20103 ,  "10/16/12" ,   60.83    )

df1<-df1%>%mutate(date= as.Date(df1$date,format= "%m/%d/%y"))
df2<-df2%>%mutate(date= as.Date(df2$date,format= "%m/%d/%y"))

data_RN<-df1 %>%mutate(datediff= df1$date-df2$date)

Output:

# A tibble: 6 x 4
     id date       value      datediff
   <dbl> <date>     <date>     <drtn>  
1  2222 2012-11-01 2012-11-01  2 days 
2  2222 2012-11-02 2012-11-02  2 days 
3 20100 2012-10-30 2012-10-30 -1 days 
4 20100 2012-11-02 2012-11-02  1 days 
5 20103 2012-10-30 2012-10-30  1 days 
6 20103 2012-10-31 2012-10-31 15 days 

If you have date-time values you might need to change the date to POSIXct class based on the format you have (read ?strptime ), also order by id so that we have all the data arranged properly and then use difftime with units specified as "mins" for minutes.

d1 <- transform(d1, date = as.POSIXct(date, format = "%m/%d/%y"))
d11 <- d1[order(d1$id), ]

d2 <- transform(d2, date = as.POSIXct(date, format = "%m/%d/%y"))
d22 <- d2[order(d2$id), ]

difftime(d11$date, d22$date, units = "mins")
#Time differences in mins
#[1]  2880  2880 -1440  1440  1440 21600

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM