簡體   English   中英

僅提取在 R 中具有唯一列值組合的行

[英]Extracting only those rows with a unique combination of column values in R

我有一個提供路線詳細信息的航班數據庫,看起來像這樣

Ori.  Dest  Carr. Pass Flights
JFK   LAX   Delta 15004 50
JFK   LAX   JetBl 17434 100
JFK   BOS   Delta 15344 89
ATL   FLR   AmerA 25054 90
OHD   LAX   Delta 19876 95
OHD   LAX   AmerA 12344 45

對於 output,我只需要只有 1 個運營商的路線 output 應該是這樣的 -

JFK   BOS   Delta 15344 89
ATL   FLR   AmerA 25054 90

如何在 R 中執行此操作?

您可以使用:

library(dplyr)
df %>% group_by(Ori., Dest) %>% filter(n() == 1)

# Ori.  Dest  Carr.  Pass Flights
#  <chr> <chr> <chr> <int>   <int>
#1 JFK   BOS   Delta 15344      89
#2 ATL   FLR   AmerA 25054      90

使用data.table一個

library(data.table)
setDT(df)[, .SD[.N == 1], .(Ori., Dest)]

和基礎 R:

subset(df, ave(Flights, Ori., Dest, FUN = length) == 1)

數據

df <- structure(list(Ori. = c("JFK", "JFK", "JFK", "ATL", "OHD", "OHD"
), Dest = c("LAX", "LAX", "BOS", "FLR", "LAX", "LAX"), Carr. = c("Delta", 
"JetBl", "Delta", "AmerA", "Delta", "AmerA"), Pass = c(15004L, 
17434L, 15344L, 25054L, 19876L, 12344L), Flights = c(50L, 100L, 
89L, 90L, 95L, 45L)), class = "data.frame", row.names = c(NA, -6L))

我們可以在base R中執行此操作,無需任何分組操作

df[!(duplicated(df[1:2])|duplicated(df[1:2], fromLast = TRUE)),]
#  Ori. Dest Carr.  Pass Flights
#3  JFK  BOS Delta 15344      89
#4  ATL  FLR AmerA 25054      90

數據

df <- structure(list(Ori. = c("JFK", "JFK", "JFK", "ATL", "OHD", "OHD"
), Dest = c("LAX", "LAX", "BOS", "FLR", "LAX", "LAX"), Carr. = c("Delta", 
"JetBl", "Delta", "AmerA", "Delta", "AmerA"), Pass = c(15004L, 
17434L, 15344L, 25054L, 19876L, 12344L), Flights = c(50L, 100L, 
89L, 90L, 95L, 45L)), class = "data.frame", row.names = c(NA, -6L))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM