简体   繁体   English

用semi_join替换inner_join

[英]Replace inner_join with semi_join

The code below works as expected. 下面的代码按预期方式工作。 Executing up to and including the line head(1) , I find that JFK to LAX is the route with the most flights. 执行直到并包括line head(1) ,我发现JFK到LAX是航班最多的航线。 Then, I use inner_join to filter the flights table to include only flights on this route. 然后,我使用inner_join过滤flights表,使其仅包括该路线上的航班。 This gives me 11,252 rows. 这给了我11,252行。

library(nycflights13)
library(dplyr)

flights %>% 
  group_by(origin, dest) %>% 
  summarize(num_flights=n()) %>% 
  arrange(-num_flights) %>% 
  head(1) %>% # JFK to LAX has the most flights
  select(origin, dest) %>% 
  inner_join(flights, by=c("origin", "dest"))

How can I instead use semi_join to achieve the same goal? 我该如何使用semi_join达到相同的目标? I want to have a single line of code as above rather than using a temp variable. 我想要上面一行代码,而不是使用temp变量。 However, if I would write it with a temp variable, it would look like this. 但是,如果我用temp变量编写它,它将看起来像这样。 It gives the same result: 结果相同:

  filterList <- flights %>% 
  group_by(origin, dest) %>% 
  summarize(num_flights=n()) %>% 
  arrange(-num_flights) %>% 
  head(1) %>% 
  select(origin, dest)

  semi_join(flights, filterList, by=c("origin", "dest") )

I'd like to keep the logic similar such that first I determine the filter and then apply it. 我想保持类似的逻辑,以便首先确定过滤器,然后再应用它。 I think I would be interested in a right_semi_join function, but that does not exist. 我想我会对right_semi_join函数感兴趣,但这并不存在。

Selecting the route with the most flights without using join 在不使用联接的情况下选择航班最多的路线

library(nycflights13)
library(dplyr)

df2 <- flights %>% 
  add_count(origin, dest) %>%
  top_n(1)

df2$n <- NULL

> setequal(df1, df2) # assuming original data.frame is stored in df1
TRUE

Use the . 使用. to put the chain data in to the second parameter rather than the first. 将链数据放入第二个参数,而不是第一个。

flights %>% 
  group_by(origin, dest) %>% 
  summarize(num_flights=n()) %>% 
  arrange(-num_flights) %>% 
  head(1) %>% # JFK to LAX has the most flights
  select(origin, dest) %>% 
  semi_join(flights, ., by=c("origin", "dest"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM