简体   繁体   English

计算两个日期之间的时间差并将它们添加到新列中

[英]Calculate the difference in time between two dates and add them to a new column

I have a dataset with +1M rows.我有一个 +1M 行的数据集。 It has a start_date and an end_date using the format "aaaa-mm-dd hh-mm-ss" .它有一个 start_date 和一个 end_date,格式为"aaaa-mm-dd hh-mm-ss" I want to add a new column to the dataset with the duration of the time between the end and start date for each row.我想向数据集添加一个新列,其中包含每行的结束日期和开始日期之间的持续时间。

So far, I'm able to get the time difference using the difftime function:到目前为止,我能够使用difftime function 获得时差:

difftime("2020-11-01 13:45:40", "2020-11-01 13:36:00", units = "mins")

This gets me the following output: Time difference of 9.666667 mins , which I would like to replicate for the entire +1M-rows dataset.这让我得到以下 output: Time difference of 9.666667 mins ,我想为整个 +1M 行数据集复制它。

For my test I'm working with a small tibble.对于我的测试,我正在使用一个小标题。 I tried using the mutate function with rowwise and list .我尝试使用带有rowwiselistmutate function 。 My code goes as follows:我的代码如下:

  rowwise() %>% 
  mutate(trip_duration = list((difftime(as.Date(df$`end time`), as.Date(df$`start time`), units = "mins"))))

This provides the following output:这提供了以下 output:

# A tibble: 3 × 3
# Rowwise: 
  `start time`        `end time`          trip_duration
  <chr>               <chr>               <list>       
1 2020-11-01 13:36:00 2020-11-01 13:45:40 <drtn [3]>   
2 2020-11-01 13:36:00 2020-11-01 13:45:40 <drtn [3]>   
3 2020-11-01 13:36:00 2020-11-01 13:45:40 <drtn [3]>  

The new column doesn't show what I'm looking for, it just shows the number 3 for each row no matter if I ask for the result in mins , secs , or even hours , and now I'm stuck trying to figure out the way to do the calculation.新列没有显示我要查找的内容,它只显示每行的数字3 ,无论我是否要求以minssecs甚至hours为单位的结果,现在我被困在试图弄清楚进行计算的方法。

Thanks in advance to anyone able to help, cheers!提前感谢任何能够提供帮助的人,干杯!

You are wrapping it in a list.您将其包装在列表中。

This code:这段代码:

rowwise() %>% 
  mutate(trip_duration = list((difftime(as.Date(df$`end time`), as.Date(df$`start time`), units = "mins"))))

Should be:应该:

rowwise %>% 
  mutate(
    trip_duration = as.Double( # in docs for ?difftime
      difftime(
        as.Date(df$`end time`), 
        as.Date(df$`start time`), 
        units = "mins")
      )
    )
  )

If you are going to be working with dates/times I strongly recommend using lubridate: https://lubridate.tidyverse.org/index.html it simplifies a lot of common problems, this one included.如果您要处理日期/时间,我强烈建议您使用 lubridate: https://lubridate.tidyverse.org/index.html它简化了许多常见问题,包括这个问题。

You need to make some changes in your code.您需要对代码进行一些更改。

First and foremost, don't use $ in dplyr pipes.首先,不要在dplyr管道中使用$ Pipes ( %>% ) were created to avoid using df$column_name everytime you want to use variable from the dataframe.创建管道 ( %>% ) 是为了避免在每次您想使用 dataframe 中的变量时使用df$column_name

Secondly, difftime is vectorised so no need of rowwise here.其次, difftime是矢量化的,所以这里不需要按rowwise

Finally, if you want time difference in minutes you should change the values to POSIXct type and not dates.最后,如果您想要以分钟为单位的时差,您应该将值更改为POSIXct类型而不是日期。 Try the following -尝试以下 -

library(dplyr)

df <- df %>%
  mutate(trip_duration = difftime(as.POSIXct(`end time`), 
                                  as.POSIXct(`start time`), units = "mins"))

I know this has been answered, but I wanted to add a lubridate approach.我知道这已经得到解答,但我想添加一种lubridate方法。

First, import and ensure your columns are in the correct format:首先,导入并确保您的列格式正确:

library(lubridate)
df$`end time`<- lubridate::as_datetime(df$`end time`)
df$`start time` <- lubridate::as_datetime(df$`start time`)

Then simply add the column (ensure you put the latest date first, or you'll get a negative number):然后只需添加列(确保将最新日期放在第一位,否则您将得到一个负数):

df$trip_duration <- time_length(df$`end time` - df$`start time`, unit="days")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM