![](/img/trans.png)
[英]How to delete a row if it shares the value of another row in one column and has one value in other column in R?
[英]How to match value of one row with its immediate another row in different column in R
因此,我有客戶在線購買機票的這些數據。 我想看看其中有多少人預訂了回程票。 因此,基本上,我想針對同一個人和帳戶將起始城市與緊鄰行的目標城市進行匹配,反之亦然,這將為我提供他們的雙向旅行數據,然后我要計算他們的旅行天數。 我正在R中嘗試執行此操作,但是我無法將原點與直接行的目的地進行匹配,反之亦然。
我已經對客戶的帳號進行了排序,以手動查看是否有回程,並且有很多回程。
數據是這樣的:
Account number origin city Destination city Date
1 London chicago 7/22/2018
2 Milan London 7/23/2018
2 London Milan 7/28/2018
1 chicago london 8/22/2018
另一種選擇是加入自己,領域相反。
編輯:添加了“ trip_num”以更好地處理同一個人的重復旅行。
library(dplyr)
# First, convert date field to Date type
df <- df %>%
mutate(Date = lubridate::mdy(Date)) %>%
# update with M-M's suggestion in comments
mutate_at(.vars = vars(origin_city, Destination_city), .funs = toupper) %>%
# EDIT: adding trip_num to protect against extraneous joins for repeat trips
group_by(Account_number, origin_city, Destination_city) %>%
mutate(trip_num = row_number()) %>%
ungroup()
df2 <- df %>%
left_join(df, by = c("Account_number", "trip_num",
"origin_city" = "Destination_city",
"Destination_city" = "origin_city")) %>%
mutate(days = (Date.x - Date.y)/lubridate::ddays(1))
> df2
# A tibble: 6 x 7
Account_number origin_city Destination_city Date.x trip_num Date.y days
<int> <chr> <chr> <date> <int> <date> <dbl>
1 1 LONDON CHICAGO 2018-07-22 1 2018-08-22 -31
2 2 MILAN LONDON 2018-07-23 1 2018-07-28 -5
3 2 LONDON MILAN 2018-07-28 1 2018-07-23 5
4 1 CHICAGO LONDON 2018-08-22 1 2018-07-22 31
5 2 MILAN LONDON 2018-08-23 2 2018-08-28 -5
6 2 LONDON MILAN 2018-08-28 2 2018-08-23 5
數據:(按Account_number 2添加重復行程)
df <- read.table(
header = T,
stringsAsFactors = F,
text = "Account_number origin_city Destination_city Date
1 London chicago 7/22/2018
2 Milan London 7/23/2018
2 London Milan 7/28/2018
1 chicago london 8/22/2018
2 Milan London 8/23/2018
2 London Milan 8/28/2018")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.