简体   繁体   中英

Calculate mean days in R using dplyr, filter, group_by and summarise?

I want to create a table that shows the mean days by submitted_via (Please see the consumer_compliants.csv) using date_diff, subtract date_sent and date_received. Data is filtered to show only date_diff values greater than 0. All this has to be done using dplyr, %>% , filter, group_by, and summarise_at, knitr::kable()

I have tried this in R

date_received <- as.Date(mydata$date_received, "%m/%d/%Y")
date_sent <- as.Date(mydata$date_sent_to_company, "%m/%d/%Y")
date_diff <- (date_sent) - (date_received)

mydata %>%                  
 filter(date_diff > 0) %>%    
 group_by(date_received, date_sent_to_company) %>%   
 summarise(
    a = mean(date_diff))

Output:

 Email         11.973214 days           
 Fax           7.057072 days            
 Phone         6.290040 days            
 Postal mail   9.627809 days            
Referral       6.761684 days            
 Web           10.695773 days   

Any suggestions please?

This might be something closer to what you want:

library(dplyr)

mydata %>%
  mutate_at(vars(starts_with("date_")), as.Date, format = "%m/%d/%Y") %>%
  mutate(date_diff = date_received - date_sent) %>%
  filter(date_diff > 0) %>%    
  group_by(submitted_via) %>%   
  summarise(a = mean(date_diff))

Output

# A tibble: 3 x 2
  submitted_via a      
  <fct>         <drtn> 
1 phone         22 days
2 Referral      27 days
3 web            4 days

Data

mydata <- read.table(
  text =
    "date_received      date_sent   submitted_via
  9/30/2015          9/3/2015      Referral
  9/3/2015           8/30/2015     web
  9/25/2015          9/3/2015      phone
  9/18/2015          9/18/2015     Referral", header = T
)

In base R, we can do in the following way :

#select the date columns
cols <- c("date_received", "date_sent_to_company")
#Change the columns to date class
consumer_complaints[cols] <- lapply(consumer_complaints[cols],as.Date,"%m/%d/%Y")

#Suntract values between date_sent_to_company and date_received 
#Select rows where dat_diff is greater than 0 and take mean for each submitted_via
aggregate(date_diff~submitted_via, subset(transform(consumer_complaints, 
          date_diff = date_sent_to_company - date_received), date_diff > 0), mean)

#  submitted_via date_diff
#1         Email    11.97 
#2           Fax     7.06 
#3         Phone     6.29 
#4   Postal mail     9.63 
#5      Referral     6.76 
#6           Web    10.70 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM