[英]How to aggregate Character Data of a data frame based on different column in r
[英]Aggregate data frame based on column value in R
我有这个df:
time type person link mode
1 13834 departure 1537047 335909 car_passenger
2 14516 arrival 1537047 79554 car_passenger
3 15380 departure 3716370 280959 car
4 16750 departure 6968274 562332 car_passenger
5 16777 arrival 3716370 327822 car
6 16819 departure 3863945 178860 car
我想要的是创建一个新的 df 来总结原始数据最终看起来像这样(与实际数据使用的数字略有不同):
person time_dep time_ar link_dep link_ar mode
1537047 13834 14516 335909 79554 car_passenger
1537047 20000 20010 3245 623423 car_passenger
16750 35433 36762 13335 82991 car
这是旅行的汇总表。 一个person
可以有多次旅行。 重要的是把正确的departure
和arrival
结合在一起。 例如,这可以通过对同一person
采取最小的time
增量来完成。 数据集也按时间顺序排列(但从第一个表中可以看出, person
离开可能发生在下一次arrival
),这也可能有助于获得所需的结果。
我真的不知道如何解决这个问题,很高兴得到您的帮助。
一些数据:
structure(list(time = c(13834, 14516, 15380, 16750, 16777, 16819,
16966, 17019, 17166, 17231, 17388, 17584, 17655, 17722, 17779,
18011, 18017, 18054, 18055, 18244, 18279, 18565, 18624, 18671,
18671, 18671, 18690, 18742, 18779, 18779, 18844, 18844, 19042,
19051, 19152, 19167, 19200, 19232, 19293, 19293, 19347, 19365,
19396, 19440, 19440, 19493, 19560, 19578, 19611, 19634, 19680,
19680, 19706, 19747, 19747, 19785, 19785, 19800, 19851, 19920,
19920, 19961, 19961, 20004, 20004, 20040, 20064, 20075, 20078,
20079, 20079, 20085, 20085, 20100, 20100, 20117, 20125, 20143,
20175, 20175, 20245, 20246, 20308, 20310, 20365, 20400, 20400,
20400, 20408, 20446, 20446, 20457, 20510, 20511, 20520, 20520,
20527, 20527, 20557, 20559, 20562, 20603, 20603, 20604, 20628,
20644, 20654, 20672, 20681, 20684, 20700, 20723, 20730, 20786,
20794, 20820, 20820, 20839, 20839, 20880, 20880, 20880, 20880,
20880, 20880, 20896, 20896, 20898, 20919, 20919, 20951, 20981,
20981, 20992, 20992, 20994, 21000, 21000, 21011, 21015, 21020,
21042, 21057, 21057, 21078, 21097, 21116, 21128, 21128, 21128,
21135, 21135, 21143, 21160, 21160, 21180, 21180, 21182, 21201,
21201, 21205, 21209, 21251, 21262, 21269, 21269, 21274, 21294,
21296, 21300, 21300, 21308, 21311, 21311, 21312, 21323, 21323,
21337, 21337, 21360, 21360, 21360, 21360, 21360, 21367, 21369,
21369, 21379, 21379, 21426, 21480, 21480, 21480, 21496, 21505,
21505, 21507, 21515, 21515, 21519), type = c("departure", "arrival",
"departure", "departure", "arrival", "departure", "departure",
"departure", "arrival", "arrival", "arrival", "departure", "departure",
"departure", "arrival", "departure", "departure", "departure",
"arrival", "departure", "departure", "departure", "arrival",
"departure", "arrival", "departure", "arrival", "departure",
"arrival", "departure", "arrival", "departure", "arrival", "departure",
"departure", "arrival", "departure", "departure", "arrival",
"departure", "arrival", "departure", "departure", "arrival",
"departure", "arrival", "departure", "departure", "departure",
"arrival", "arrival", "departure", "departure", "arrival", "departure",
"arrival", "departure", "departure", "arrival", "departure",
"departure", "arrival", "departure", "arrival", "departure",
"departure", "departure", "departure", "departure", "arrival",
"departure", "arrival", "departure", "arrival", "departure",
"departure", "arrival", "arrival", "arrival", "departure", "departure",
"departure", "arrival", "departure", "departure", "departure",
"arrival", "departure", "departure", "arrival", "departure",
"departure", "arrival", "departure", "arrival", "departure",
"arrival", "departure", "departure", "arrival", "arrival", "arrival",
"departure", "departure", "arrival", "arrival", "departure",
"arrival", "departure", "arrival", "departure", "arrival", "departure",
"arrival", "departure", "arrival", "departure", "arrival", "departure",
"arrival", "departure", "arrival", "departure", "arrival", "departure",
"arrival", "departure", "departure", "arrival", "departure",
"arrival", "arrival", "departure", "arrival", "departure", "arrival",
"arrival", "departure", "arrival", "departure", "departure",
"arrival", "arrival", "departure", "departure", "departure",
"departure", "arrival", "departure", "arrival", "departure",
"arrival", "departure", "arrival", "departure", "departure",
"departure", "arrival", "arrival", "departure", "departure",
"departure", "departure", "departure", "arrival", "departure",
"departure", "departure", "arrival", "departure", "departure",
"arrival", "arrival", "departure", "departure", "arrival", "departure",
"arrival", "departure", "departure", "arrival", "departure",
"arrival", "departure", "arrival", "arrival", "departure", "arrival",
"departure", "departure", "departure", "arrival", "departure",
"arrival", "arrival", "departure", "departure", "arrival", "departure",
"arrival"), person = c(1537047L, 1537047L, 3716370L, 6968274L,
3716370L, 3863945L, 212488L, 7301220L, 7301220L, 3863945L, 6968274L,
5332232L, 1563169L, 8832180L, 5332232L, 53332L, 9363423L, 1582903L,
8832180L, 1544050L, 10168750L, 8106267L, 1563169L, 1942964L,
53332L, 53332L, 8106267L, 3525961L, 1942964L, 1942964L, 1582903L,
1582903L, 9363423L, 273884L, 1551285L, 10168750L, 4620589L, 8672528L,
4620589L, 4620589L, 3525961L, 120748L, 1582161L, 4620589L, 4620589L,
273884L, 8538636L, 2839747L, 1115774L, 212488L, 1582903L, 1582903L,
630772L, 1582903L, 1582903L, 120748L, 120748L, 10008400L, 8672528L,
8871647L, 7873048L, 8871647L, 8871647L, 7873048L, 7873048L, 9588884L,
132932L, 1125383L, 1100599L, 4620589L, 4620589L, 10008400L, 10008400L,
7873048L, 7873048L, 1372344L, 2839747L, 8538636L, 1125383L, 1125383L,
8607230L, 8538636L, 1100599L, 223889L, 10698643L, 6926932L, 1582903L,
1582903L, 3184220L, 10008400L, 10008400L, 126661L, 9588884L,
1936227L, 53332L, 53332L, 4620589L, 4620589L, 10475531L, 1582903L,
8607230L, 7873048L, 7873048L, 7871417L, 3184220L, 10698643L,
6944012L, 53332L, 9588884L, 7871417L, 7630153L, 630772L, 438663L,
132932L, 156863L, 1125383L, 1125383L, 8538636L, 8538636L, 1942964L,
1942964L, 120748L, 120748L, 4620589L, 4620589L, 120748L, 120748L,
8798521L, 4620589L, 4620589L, 7630153L, 1936227L, 1936227L, 156863L,
156863L, 1942964L, 8871647L, 8871647L, 8871647L, 1969950L, 7630153L,
1582161L, 223889L, 223889L, 609713L, 120018L, 1635088L, 8798521L,
8798521L, 8798521L, 8798521L, 1115774L, 9630659L, 9588884L, 9588884L,
8059629L, 4615165L, 1372344L, 126661L, 126661L, 1460810L, 443004L,
1637943L, 7715078L, 7873048L, 7873048L, 1806138L, 207326L, 9630659L,
9384866L, 9081533L, 120018L, 1806138L, 1806138L, 8060841L, 9081533L,
9081533L, 6926932L, 6926932L, 9190134L, 10008400L, 10008400L,
120748L, 120748L, 8798521L, 7630153L, 7630153L, 9190134L, 9190134L,
7871414L, 8893870L, 8538636L, 8538636L, 1125383L, 8538636L, 8538636L,
1339818L, 443004L, 443004L, 6944012L), link = c("335909", "79554",
"280959", "562332", "327822", "178860", "526806", "81312", "665422",
"522184", "594823", "169832", "335758", "422633", "198305", "260837",
"159398", "405212", "475143", "369046", "159398", "265683", "33956",
"642211", "pt_StopPoint:59259", "pt_StopPoint:59259", "129827",
"247172", "642211", "642211", "576537", "576537", "503838", "58400",
"276736", "594475", "325560", "422633", "81089", "81089", "345004",
"282667", "318006", "275330", "275330", "418351", "194581", "191644",
"137033", "84619", "215020", "215020", "650155", "pt_StopPoint:59244",
"pt_StopPoint:59244", "pt_StopPoint:59629", "pt_StopPoint:59629",
"624296", "177396", "327762", "378263", "365797", "365797", "484026",
"484026", "359014", "220031", "12809", "2994", "617213", "617213",
"398700", "398700", "484027", "484027", "228252", "551734", "566383",
"pt_StopPoint:59395", "pt_StopPoint:59395", "617034", "566383",
"585524", "19281", "81312", "563052", "pt_StopPoint:59451", "pt_StopPoint:59451",
"191644", "pt_StopPoint:59065", "pt_StopPoint:59065", "10503",
"68240", "335379", "pt_StopPoint:59208", "pt_StopPoint:59208",
"pt_StopPoint:59229", "pt_StopPoint:59229", "466140", "663449",
"274734", "29120", "29120", "265680", "407585", "387534", "562332",
"80180", "68240", "79657", "162788", "183902", "139393", "20123",
"306973", "pt_StopPoint:59207", "pt_StopPoint:59207", "pt_StopPoint:59551",
"pt_StopPoint:59551", "24063", "24063", "pt_StopPoint:59626",
"pt_StopPoint:59626", "pt_StopPoint:59236", "pt_StopPoint:59236",
"pt_StopPoint:59263", "pt_StopPoint:59263", "422633", "pt_StopPoint:59244",
"pt_StopPoint:59244", "61517", "pt_StopPoint:59453", "pt_StopPoint:59453",
"pt_StopPoint:59552", "pt_StopPoint:59552", "162788", "215020",
"215020", "325560", "24787", "61517", "334163", "97224", "97224",
"111647", "569972", "212093", "85887", "85887", "63872", "63872",
"254857", "572649", "pt_StopPoint:59287", "pt_StopPoint:59287",
"5013", "247172", "640698", "pt_StopPoint:59657", "pt_StopPoint:59657",
"653916", "349081", "510321", "176129", "pt_StopPoint:59435",
"pt_StopPoint:59435", "259494", "141747", "269548", "592784",
"607746", "92978", "259494", "259494", "55828", "pt_StopPoint:59740",
"pt_StopPoint:59740", "258060", "258060", "97219", "pt_StopPoint:59116",
"pt_StopPoint:59116", "pt_StopPoint:59368", "pt_StopPoint:59368",
"522909", "629561", "629561", "97219", "97219", "182579", "420842",
"pt_StopPoint:59217", "pt_StopPoint:59217", "590832", "pt_StopPoint:59216",
"pt_StopPoint:59216", "671123", "pt_StopPoint:59527", "pt_StopPoint:59527",
"77378"), mode = c("car_passenger", "car_passenger", "car", "car_passenger",
"car", "car", "walk", "car_passenger", "car_passenger", "car",
"car_passenger", "car", "transit_walk", "car", "car", "access_walk",
"car", "access_walk", "car", "transit_walk", "car", "car", "transit_walk",
"access_walk", "access_walk", "pt", "car", "car", "access_walk",
"pt", "access_walk", "pt", "car", "car", "walk", "car", "access_walk",
"car", "access_walk", "pt", "car", "access_walk", "walk", "pt",
"egress_walk", "car", "transit_walk", "car", "walk", "walk",
"pt", "transit_walk", "walk", "transit_walk", "pt", "access_walk",
"pt", "transit_walk", "car", "access_walk", "access_walk", "access_walk",
"pt", "access_walk", "pt", "transit_walk", "car", "access_walk",
"walk", "egress_walk", "access_walk", "transit_walk", "access_walk",
"pt", "egress_walk", "transit_walk", "car", "transit_walk", "access_walk",
"pt", "car", "access_walk", "walk", "access_walk", "car", "transit_walk",
"pt", "egress_walk", "car", "access_walk", "pt", "access_walk",
"transit_walk", "access_walk", "pt", "egress_walk", "access_walk",
"pt", "car", "egress_walk", "car", "egress_walk", "access_walk",
"car", "car", "car", "car", "egress_walk", "access_walk", "car",
"transit_walk", "walk", "walk", "car", "access_walk", "pt", "egress_walk",
"access_walk", "pt", "pt", "egress_walk", "pt", "transit_walk",
"pt", "transit_walk", "transit_walk", "pt", "car", "transit_walk",
"pt", "transit_walk", "access_walk", "pt", "access_walk", "pt",
"egress_walk", "pt", "egress_walk", "egress_walk", "walk", "access_walk",
"walk", "access_walk", "pt", "walk", "car", "access_walk", "car",
"outside", "outside", "car", "walk", "car", "access_walk", "pt",
"transit_walk", "car", "transit_walk", "access_walk", "pt", "car",
"access_walk", "walk", "car", "access_walk", "pt", "access_walk",
"access_walk", "car", "car", "access_walk", "car", "access_walk",
"pt", "car", "access_walk", "pt", "transit_walk", "access_walk",
"access_walk", "pt", "egress_walk", "pt", "egress_walk", "car",
"access_walk", "pt", "access_walk", "pt", "car", "transit_walk",
"pt", "transit_walk", "egress_walk", "transit_walk", "pt", "walk",
"access_walk", "pt", "car")), row.names = c(NA, 200L), class = "data.frame")
您希望将数据转换为宽格式。 使用library(data.table)
你可以这样做:
setDT(x) # convert to data.table
dcast(x, person+mode~type, value.var=c('time', 'link'), fun.aggregate=first, fill=NA)
# person mode time_arrival time_departure link_arrival link_departure
# 1: 53332 access_walk 18671 18011 pt_StopPoint:59259 260837
# 2: 53332 egress_walk 20672 20520 80180 pt_StopPoint:59208
# 3: 53332 pt 20520 18671 pt_StopPoint:59208 pt_StopPoint:59259
# 4: 120018 car 21308 21097 92978 569972
# 5: 120748 access_walk 19785 19365 pt_StopPoint:59629 282667
# ---
# 103: 10008400 pt 21360 20446 pt_StopPoint:59116 pt_StopPoint:59065
# 104: 10008400 transit_walk 20085 19800 398700 624296
# 105: 10168750 car 19167 18279 594475 159398
# 106: 10475531 car NA 20557 <NA> 466140
# 107: 10698643 car 20644 20365 387534 81312
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.