繁体   English   中英

R:如何使用运输智能卡数据将旅程段组合到旅程中

[英]R: how to combine trip segments into a journey with Transport smart card data

当前正在使用有趣的运输智能卡数据集。 当前数据中的每一行代表一个行程(例如,从A到B的巴士行程)。 60分钟内的任何旅行都需要分组。

当前表:

CustomerID  SegmentID   OriginLocation  Destination Start Time  End Time    Fare    TypeOfTransport ..  ..  ..
A001        101         LocationA   LocationB       7:30am      7:45am      1.5        Bus          
A001        102         LocationB   LocationC       7:50am      8:30am      3.5        Train            
A001        103         LocationC   LocationB       17:10pm     18:00pm     3.5        Train            
A001        104         LocationB   LocationA       18:10pm     18:30pm     1.5        Bus          
A002        105         LocationK   LocationY       11:30am     12:30pm     3          Train            
A003        106         LocationP   LocationO       10:23am     11:13am     4          Ferrie           

并隐瞒了……

CustomerID  JourneyID   OriginLocation  Destination Start Time  End Time    Fare    TypeOfTransport    NumOfTrips
A001        1           LocationA       LocationC   7:30am      8:30am      5       Intermodal        2
A001        2           LocationC       LocationA   17:10pm     18:30pm     5       Intermodal        2
A002        6           LocationK       LocationY   11:30am     12:30pm     3       Train             1
A003        8           LocationP       LocationO   10:23am     11:13am     4       Ferrie            1

我是R的新手,也不知道如何开始,所以任何指导都将不胜感激。

这是示例数据的入门。 您的真实数据可能会使事情变得更加复杂...

library(dplyr)

## prepare data
df<- read.table(header=T, stringsAsFactors = FALSE, text="
CustomerID  SegmentID   OriginLocation  Destination StartTime  EndTime    Fare    TypeOfTransport
A001        101         LocationA   LocationB       7:30am      7:45am      1.5        Bus
A001        102         LocationB   LocationC       7:50am      8:30am      3.5        Train
A001        103         LocationC   LocationB       17:10pm     18:00pm     3.5        Train
A001        104         LocationB   LocationA       18:10pm     18:30pm     1.5        Bus
A002        105         LocationK   LocationY       11:30am     12:30pm     3          Train
A003        106         LocationP   LocationO       10:23am     11:13am     4          Ferrie")
df$StartTime <- as.POSIXct(df$StartTime, format="%H:%M%p")
df$EndTime <- as.POSIXct(df$EndTime, format="%H:%M%p")


## Get group for each observation according to start-/endtimes + gap (60secs+60mins)
getGrp <- function(EndTimes, StartTimes, maxgap = 60*60) {
  vbetween <- Vectorize(dplyr::between, c("left", "right"))
  none <- Negate(any)
  if (length(EndTimes) > 1) {
    mat <-   vbetween(EndTimes, StartTimes-maxgap, StartTimes)
    idx <- apply(mat, 1, none, na.rm = T)
    grp <- cumsum(c(TRUE, head(idx, -1)))
  } else {
    grp <- 1L
  }
  return(grp)
}

## Summarise
df %>% 
  group_by(CustomerID) %>% 
  mutate(grp = getGrp(EndTime, StartTime)) %>%
  group_by(grp, add = TRUE) %>%
  summarise(Fare = sum(Fare), 
            NumOfTrips = n(), 
            StartTime = format(min(StartTime), "%H:%M"), 
            EndTime = format(max(EndTime), "%H:%M"), 
            OriginLocation = OriginLocation[row_number(1)],
            Destination = tail(Destination, 1),
            TypeOfTransport = list(TypeOfTransport)) %>%
  select(-grp) 

## Result
# CustomerID  Fare NumOfTrips StartTime EndTime OriginLocation Destination TypeOfTransport
# (chr) (dbl)      (int)     (chr)   (chr)          (chr)       (chr)           (chr)
# 1       A001     5          2     07:30   08:30      LocationA   LocationC        <chr[2]>
# 2       A001     5          2     17:10   18:30      LocationC   LocationA        <chr[2]>
# 3       A002     3          1     11:30   12:30      LocationK   LocationY        <chr[1]>
# 4       A003     4          1     10:23   11:13      LocationP   LocationO        <chr[1]>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM