[英]Extracting only those rows with a unique combination of column values in R
我有一個提供路線詳細信息的航班數據庫,看起來像這樣
Ori. Dest Carr. Pass Flights
JFK LAX Delta 15004 50
JFK LAX JetBl 17434 100
JFK BOS Delta 15344 89
ATL FLR AmerA 25054 90
OHD LAX Delta 19876 95
OHD LAX AmerA 12344 45
對於 output,我只需要只有 1 個運營商的路線 output 應該是這樣的 -
JFK BOS Delta 15344 89
ATL FLR AmerA 25054 90
如何在 R 中執行此操作?
您可以使用:
library(dplyr)
df %>% group_by(Ori., Dest) %>% filter(n() == 1)
# Ori. Dest Carr. Pass Flights
# <chr> <chr> <chr> <int> <int>
#1 JFK BOS Delta 15344 89
#2 ATL FLR AmerA 25054 90
使用data.table
一個
library(data.table)
setDT(df)[, .SD[.N == 1], .(Ori., Dest)]
和基礎 R:
subset(df, ave(Flights, Ori., Dest, FUN = length) == 1)
數據
df <- structure(list(Ori. = c("JFK", "JFK", "JFK", "ATL", "OHD", "OHD"
), Dest = c("LAX", "LAX", "BOS", "FLR", "LAX", "LAX"), Carr. = c("Delta",
"JetBl", "Delta", "AmerA", "Delta", "AmerA"), Pass = c(15004L,
17434L, 15344L, 25054L, 19876L, 12344L), Flights = c(50L, 100L,
89L, 90L, 95L, 45L)), class = "data.frame", row.names = c(NA, -6L))
我們可以在base R
中執行此操作,無需任何分組操作
df[!(duplicated(df[1:2])|duplicated(df[1:2], fromLast = TRUE)),]
# Ori. Dest Carr. Pass Flights
#3 JFK BOS Delta 15344 89
#4 ATL FLR AmerA 25054 90
df <- structure(list(Ori. = c("JFK", "JFK", "JFK", "ATL", "OHD", "OHD"
), Dest = c("LAX", "LAX", "BOS", "FLR", "LAX", "LAX"), Carr. = c("Delta",
"JetBl", "Delta", "AmerA", "Delta", "AmerA"), Pass = c(15004L,
17434L, 15344L, 25054L, 19876L, 12344L), Flights = c(50L, 100L,
89L, 90L, 95L, 45L)), class = "data.frame", row.names = c(NA, -6L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.