簡體   English   中英

根據 R 中的列值聚合數據框

[英]Aggregate data frame based on column value in R

我有這個df:

   time      type  person   link          mode
1 13834 departure 1537047 335909 car_passenger
2 14516   arrival 1537047  79554 car_passenger
3 15380 departure 3716370 280959           car
4 16750 departure 6968274 562332 car_passenger
5 16777   arrival 3716370 327822           car
6 16819 departure 3863945 178860           car

我想要的是創建一個新的 df 來總結原始數據最終看起來像這樣(與實際數據使用的數字略有不同):

person   time_dep time_ar link_dep  link_ar  mode 
1537047  13834    14516   335909    79554    car_passenger
1537047  20000    20010   3245      623423   car_passenger
16750    35433    36762   13335     82991    car

這是旅行的匯總表。 一個person可以有多次旅行。 重要的是把正確的departurearrival結合在一起。 例如,這可以通過對同一person采取最小的time增量來完成。 數據集也按時間順序排列(但從第一個表中可以看出, person離開可能發生在下一次arrival ),這也可能有助於獲得所需的結果。

我真的不知道如何解決這個問題,很高興得到您的幫助。

一些數據:

structure(list(time = c(13834, 14516, 15380, 16750, 16777, 16819, 
16966, 17019, 17166, 17231, 17388, 17584, 17655, 17722, 17779, 
18011, 18017, 18054, 18055, 18244, 18279, 18565, 18624, 18671, 
18671, 18671, 18690, 18742, 18779, 18779, 18844, 18844, 19042, 
19051, 19152, 19167, 19200, 19232, 19293, 19293, 19347, 19365, 
19396, 19440, 19440, 19493, 19560, 19578, 19611, 19634, 19680, 
19680, 19706, 19747, 19747, 19785, 19785, 19800, 19851, 19920, 
19920, 19961, 19961, 20004, 20004, 20040, 20064, 20075, 20078, 
20079, 20079, 20085, 20085, 20100, 20100, 20117, 20125, 20143, 
20175, 20175, 20245, 20246, 20308, 20310, 20365, 20400, 20400, 
20400, 20408, 20446, 20446, 20457, 20510, 20511, 20520, 20520, 
20527, 20527, 20557, 20559, 20562, 20603, 20603, 20604, 20628, 
20644, 20654, 20672, 20681, 20684, 20700, 20723, 20730, 20786, 
20794, 20820, 20820, 20839, 20839, 20880, 20880, 20880, 20880, 
20880, 20880, 20896, 20896, 20898, 20919, 20919, 20951, 20981, 
20981, 20992, 20992, 20994, 21000, 21000, 21011, 21015, 21020, 
21042, 21057, 21057, 21078, 21097, 21116, 21128, 21128, 21128, 
21135, 21135, 21143, 21160, 21160, 21180, 21180, 21182, 21201, 
21201, 21205, 21209, 21251, 21262, 21269, 21269, 21274, 21294, 
21296, 21300, 21300, 21308, 21311, 21311, 21312, 21323, 21323, 
21337, 21337, 21360, 21360, 21360, 21360, 21360, 21367, 21369, 
21369, 21379, 21379, 21426, 21480, 21480, 21480, 21496, 21505, 
21505, 21507, 21515, 21515, 21519), type = c("departure", "arrival", 
"departure", "departure", "arrival", "departure", "departure", 
"departure", "arrival", "arrival", "arrival", "departure", "departure", 
"departure", "arrival", "departure", "departure", "departure", 
"arrival", "departure", "departure", "departure", "arrival", 
"departure", "arrival", "departure", "arrival", "departure", 
"arrival", "departure", "arrival", "departure", "arrival", "departure", 
"departure", "arrival", "departure", "departure", "arrival", 
"departure", "arrival", "departure", "departure", "arrival", 
"departure", "arrival", "departure", "departure", "departure", 
"arrival", "arrival", "departure", "departure", "arrival", "departure", 
"arrival", "departure", "departure", "arrival", "departure", 
"departure", "arrival", "departure", "arrival", "departure", 
"departure", "departure", "departure", "departure", "arrival", 
"departure", "arrival", "departure", "arrival", "departure", 
"departure", "arrival", "arrival", "arrival", "departure", "departure", 
"departure", "arrival", "departure", "departure", "departure", 
"arrival", "departure", "departure", "arrival", "departure", 
"departure", "arrival", "departure", "arrival", "departure", 
"arrival", "departure", "departure", "arrival", "arrival", "arrival", 
"departure", "departure", "arrival", "arrival", "departure", 
"arrival", "departure", "arrival", "departure", "arrival", "departure", 
"arrival", "departure", "arrival", "departure", "arrival", "departure", 
"arrival", "departure", "arrival", "departure", "arrival", "departure", 
"arrival", "departure", "departure", "arrival", "departure", 
"arrival", "arrival", "departure", "arrival", "departure", "arrival", 
"arrival", "departure", "arrival", "departure", "departure", 
"arrival", "arrival", "departure", "departure", "departure", 
"departure", "arrival", "departure", "arrival", "departure", 
"arrival", "departure", "arrival", "departure", "departure", 
"departure", "arrival", "arrival", "departure", "departure", 
"departure", "departure", "departure", "arrival", "departure", 
"departure", "departure", "arrival", "departure", "departure", 
"arrival", "arrival", "departure", "departure", "arrival", "departure", 
"arrival", "departure", "departure", "arrival", "departure", 
"arrival", "departure", "arrival", "arrival", "departure", "arrival", 
"departure", "departure", "departure", "arrival", "departure", 
"arrival", "arrival", "departure", "departure", "arrival", "departure", 
"arrival"), person = c(1537047L, 1537047L, 3716370L, 6968274L, 
3716370L, 3863945L, 212488L, 7301220L, 7301220L, 3863945L, 6968274L, 
5332232L, 1563169L, 8832180L, 5332232L, 53332L, 9363423L, 1582903L, 
8832180L, 1544050L, 10168750L, 8106267L, 1563169L, 1942964L, 
53332L, 53332L, 8106267L, 3525961L, 1942964L, 1942964L, 1582903L, 
1582903L, 9363423L, 273884L, 1551285L, 10168750L, 4620589L, 8672528L, 
4620589L, 4620589L, 3525961L, 120748L, 1582161L, 4620589L, 4620589L, 
273884L, 8538636L, 2839747L, 1115774L, 212488L, 1582903L, 1582903L, 
630772L, 1582903L, 1582903L, 120748L, 120748L, 10008400L, 8672528L, 
8871647L, 7873048L, 8871647L, 8871647L, 7873048L, 7873048L, 9588884L, 
132932L, 1125383L, 1100599L, 4620589L, 4620589L, 10008400L, 10008400L, 
7873048L, 7873048L, 1372344L, 2839747L, 8538636L, 1125383L, 1125383L, 
8607230L, 8538636L, 1100599L, 223889L, 10698643L, 6926932L, 1582903L, 
1582903L, 3184220L, 10008400L, 10008400L, 126661L, 9588884L, 
1936227L, 53332L, 53332L, 4620589L, 4620589L, 10475531L, 1582903L, 
8607230L, 7873048L, 7873048L, 7871417L, 3184220L, 10698643L, 
6944012L, 53332L, 9588884L, 7871417L, 7630153L, 630772L, 438663L, 
132932L, 156863L, 1125383L, 1125383L, 8538636L, 8538636L, 1942964L, 
1942964L, 120748L, 120748L, 4620589L, 4620589L, 120748L, 120748L, 
8798521L, 4620589L, 4620589L, 7630153L, 1936227L, 1936227L, 156863L, 
156863L, 1942964L, 8871647L, 8871647L, 8871647L, 1969950L, 7630153L, 
1582161L, 223889L, 223889L, 609713L, 120018L, 1635088L, 8798521L, 
8798521L, 8798521L, 8798521L, 1115774L, 9630659L, 9588884L, 9588884L, 
8059629L, 4615165L, 1372344L, 126661L, 126661L, 1460810L, 443004L, 
1637943L, 7715078L, 7873048L, 7873048L, 1806138L, 207326L, 9630659L, 
9384866L, 9081533L, 120018L, 1806138L, 1806138L, 8060841L, 9081533L, 
9081533L, 6926932L, 6926932L, 9190134L, 10008400L, 10008400L, 
120748L, 120748L, 8798521L, 7630153L, 7630153L, 9190134L, 9190134L, 
7871414L, 8893870L, 8538636L, 8538636L, 1125383L, 8538636L, 8538636L, 
1339818L, 443004L, 443004L, 6944012L), link = c("335909", "79554", 
"280959", "562332", "327822", "178860", "526806", "81312", "665422", 
"522184", "594823", "169832", "335758", "422633", "198305", "260837", 
"159398", "405212", "475143", "369046", "159398", "265683", "33956", 
"642211", "pt_StopPoint:59259", "pt_StopPoint:59259", "129827", 
"247172", "642211", "642211", "576537", "576537", "503838", "58400", 
"276736", "594475", "325560", "422633", "81089", "81089", "345004", 
"282667", "318006", "275330", "275330", "418351", "194581", "191644", 
"137033", "84619", "215020", "215020", "650155", "pt_StopPoint:59244", 
"pt_StopPoint:59244", "pt_StopPoint:59629", "pt_StopPoint:59629", 
"624296", "177396", "327762", "378263", "365797", "365797", "484026", 
"484026", "359014", "220031", "12809", "2994", "617213", "617213", 
"398700", "398700", "484027", "484027", "228252", "551734", "566383", 
"pt_StopPoint:59395", "pt_StopPoint:59395", "617034", "566383", 
"585524", "19281", "81312", "563052", "pt_StopPoint:59451", "pt_StopPoint:59451", 
"191644", "pt_StopPoint:59065", "pt_StopPoint:59065", "10503", 
"68240", "335379", "pt_StopPoint:59208", "pt_StopPoint:59208", 
"pt_StopPoint:59229", "pt_StopPoint:59229", "466140", "663449", 
"274734", "29120", "29120", "265680", "407585", "387534", "562332", 
"80180", "68240", "79657", "162788", "183902", "139393", "20123", 
"306973", "pt_StopPoint:59207", "pt_StopPoint:59207", "pt_StopPoint:59551", 
"pt_StopPoint:59551", "24063", "24063", "pt_StopPoint:59626", 
"pt_StopPoint:59626", "pt_StopPoint:59236", "pt_StopPoint:59236", 
"pt_StopPoint:59263", "pt_StopPoint:59263", "422633", "pt_StopPoint:59244", 
"pt_StopPoint:59244", "61517", "pt_StopPoint:59453", "pt_StopPoint:59453", 
"pt_StopPoint:59552", "pt_StopPoint:59552", "162788", "215020", 
"215020", "325560", "24787", "61517", "334163", "97224", "97224", 
"111647", "569972", "212093", "85887", "85887", "63872", "63872", 
"254857", "572649", "pt_StopPoint:59287", "pt_StopPoint:59287", 
"5013", "247172", "640698", "pt_StopPoint:59657", "pt_StopPoint:59657", 
"653916", "349081", "510321", "176129", "pt_StopPoint:59435", 
"pt_StopPoint:59435", "259494", "141747", "269548", "592784", 
"607746", "92978", "259494", "259494", "55828", "pt_StopPoint:59740", 
"pt_StopPoint:59740", "258060", "258060", "97219", "pt_StopPoint:59116", 
"pt_StopPoint:59116", "pt_StopPoint:59368", "pt_StopPoint:59368", 
"522909", "629561", "629561", "97219", "97219", "182579", "420842", 
"pt_StopPoint:59217", "pt_StopPoint:59217", "590832", "pt_StopPoint:59216", 
"pt_StopPoint:59216", "671123", "pt_StopPoint:59527", "pt_StopPoint:59527", 
"77378"), mode = c("car_passenger", "car_passenger", "car", "car_passenger", 
"car", "car", "walk", "car_passenger", "car_passenger", "car", 
"car_passenger", "car", "transit_walk", "car", "car", "access_walk", 
"car", "access_walk", "car", "transit_walk", "car", "car", "transit_walk", 
"access_walk", "access_walk", "pt", "car", "car", "access_walk", 
"pt", "access_walk", "pt", "car", "car", "walk", "car", "access_walk", 
"car", "access_walk", "pt", "car", "access_walk", "walk", "pt", 
"egress_walk", "car", "transit_walk", "car", "walk", "walk", 
"pt", "transit_walk", "walk", "transit_walk", "pt", "access_walk", 
"pt", "transit_walk", "car", "access_walk", "access_walk", "access_walk", 
"pt", "access_walk", "pt", "transit_walk", "car", "access_walk", 
"walk", "egress_walk", "access_walk", "transit_walk", "access_walk", 
"pt", "egress_walk", "transit_walk", "car", "transit_walk", "access_walk", 
"pt", "car", "access_walk", "walk", "access_walk", "car", "transit_walk", 
"pt", "egress_walk", "car", "access_walk", "pt", "access_walk", 
"transit_walk", "access_walk", "pt", "egress_walk", "access_walk", 
"pt", "car", "egress_walk", "car", "egress_walk", "access_walk", 
"car", "car", "car", "car", "egress_walk", "access_walk", "car", 
"transit_walk", "walk", "walk", "car", "access_walk", "pt", "egress_walk", 
"access_walk", "pt", "pt", "egress_walk", "pt", "transit_walk", 
"pt", "transit_walk", "transit_walk", "pt", "car", "transit_walk", 
"pt", "transit_walk", "access_walk", "pt", "access_walk", "pt", 
"egress_walk", "pt", "egress_walk", "egress_walk", "walk", "access_walk", 
"walk", "access_walk", "pt", "walk", "car", "access_walk", "car", 
"outside", "outside", "car", "walk", "car", "access_walk", "pt", 
"transit_walk", "car", "transit_walk", "access_walk", "pt", "car", 
"access_walk", "walk", "car", "access_walk", "pt", "access_walk", 
"access_walk", "car", "car", "access_walk", "car", "access_walk", 
"pt", "car", "access_walk", "pt", "transit_walk", "access_walk", 
"access_walk", "pt", "egress_walk", "pt", "egress_walk", "car", 
"access_walk", "pt", "access_walk", "pt", "car", "transit_walk", 
"pt", "transit_walk", "egress_walk", "transit_walk", "pt", "walk", 
"access_walk", "pt", "car")), row.names = c(NA, 200L), class = "data.frame")

您希望將數據轉換為寬格式。 使用library(data.table)你可以這樣做:

setDT(x) # convert to data.table
dcast(x, person+mode~type, value.var=c('time', 'link'), fun.aggregate=first, fill=NA)
#        person         mode time_arrival time_departure       link_arrival     link_departure
#   1:    53332  access_walk        18671          18011 pt_StopPoint:59259             260837
#   2:    53332  egress_walk        20672          20520              80180 pt_StopPoint:59208
#   3:    53332           pt        20520          18671 pt_StopPoint:59208 pt_StopPoint:59259
#   4:   120018          car        21308          21097              92978             569972
#   5:   120748  access_walk        19785          19365 pt_StopPoint:59629             282667
# ---                                                                                        
# 103: 10008400           pt        21360          20446 pt_StopPoint:59116 pt_StopPoint:59065
# 104: 10008400 transit_walk        20085          19800             398700             624296
# 105: 10168750          car        19167          18279             594475             159398
# 106: 10475531          car           NA          20557               <NA>             466140
# 107: 10698643          car        20644          20365             387534              81312

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM