R-比较不同行中两列中的值

Question

I have a dataframe df as seen below with two features, a departing city and an arrival city. 我有一个数据框df ，如下所示，具有两个功能，出发城市和到达城市。 Every two rows information is stored about a going and a return flight. 每两行存储一次往返航班信息。

  Departure Arrival
1    A          B
2    B          A
3    F          G
4    G          F
5    U          V
6    V          U
7    K          L
8    K          L

There is some inconsistency in the data where the same flight is repeated as it can be seen in the last two rows. 正如在最后两行中可以看到的那样，重复相同的飞行的数据中存在一些不一致之处。

How is it possible to compare for every two rows the departure city of the first row and the arrival city of the second row, and keep the ones that are equal. 如何每两行比较第一行的出发城市和第二行的到达城市，并保持相等。 The dataset is very big and of course a for-loop is not considered an option. 数据集非常大，当然不考虑使用for循环。

Thank you in advance. 先感谢您。

Answer 1

Here is a method that compares the pairs of rows using head and tail to line them up. 下面是比较对使用行的方法head和tail到线起来。

# find Departures that match the Arrival in the next row
sames <- which(head(dat$Departure, -1) == tail(dat$Arrival, -1))
# keep pairs of rows that match, maintaining order with `sort`
dat[sort(unique(c(sames, (sames + 1)))),]
  Departure Arrival
1         A       B
2         B       A
3         F       G
4         G       F
5         U       V
6         V       U

Note that the two variables have to be character vectors, not factor variables. 请注意，这两个变量必须是字符向量，而不是因子变量。 you can coerce them to character using as.character if necessary. 您可以根据需要使用as.character来强制他们使用字符。

data 数据

dat <-
structure(list(Departure = c("A", "B", "F", "G", "U", "V", "K", 
"K"), Arrival = c("B", "A", "G", "F", "V", "U", "L", "L")), .Names = c("Departure", 
"Arrival"), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8"))

Answer 2

So you just want unique flight paths? 因此，您只想要独特的飞行路线吗？ there are a number of ways to do this, I'd think the fastest would be with data.table, something like: 有很多方法可以做到这一点，我认为最快的方法是使用data.table，例如：

 library(data.table)
 df <- as.data.table(df)

 uniqueDf <- unique(df)

you can also use the duplicated function, something like 您还可以使用重复的功能，例如

 df <- df[!duplicated(df), ]

should do nicely. 应该做得很好。

Answer 3

You could also do it this way: 您也可以这样进行：

right = rep(df[c(T,F),"Arrival"]==df[c(F,T),"Departure"],each=2)
df[right,]

This returns: 返回：

   Departure Arrival
1          A       B
2          B       A
3          F       G
4          G       F
5          U       V
6          V       U

Answer 4

如果适合您，请尝试以下解决方案：

df[duplicated(paste0(df$Departure,df$Arrival))==F,]

Answer 5

This answer doesn't look for unique records, it specifically checks if a row is a duplicate of the row before. 此答案不是在查找唯一记录，而是专门检查某行是否与之前的行重复。

Adding a new column with a 1 if the row has repeated: 如果行已重复，则添加带有1的新列：

 for(i in 2:length(df$Departure)){df$test[i]=ifelse(df$Departure[i] == df$Departure[i-1] & df$Arrival[i] == df$Arrival[i-1], 1,0)}

Loops can be slow though: 循环可能很慢：

library(data.table)

df$test2 = ifelse(df$Departure == shift(df$Departure) & df$Arrival == shift(df$Arrival), 1,0)

R-比较不同行中两列中的值

问题描述

5 个解决方案

解决方案1
3 已采纳 2017-08-08 14:48:44

解决方案2
1 2017-08-08 14:42:28

解决方案3
1 2017-08-08 15:12:22

解决方案4
0 2017-08-08 14:39:45

解决方案5
0 2017-08-08 14:54:01

R-比较不同行中两列中的值

问题描述

5 个解决方案

解决方案1 3 已采纳 2017-08-08 14:48:44

解决方案2 1 2017-08-08 14:42:28

解决方案3 1 2017-08-08 15:12:22

解决方案4 0 2017-08-08 14:39:45

解决方案5 0 2017-08-08 14:54:01

解决方案1
3 已采纳 2017-08-08 14:48:44

解决方案2
1 2017-08-08 14:42:28

解决方案3
1 2017-08-08 15:12:22

解决方案4
0 2017-08-08 14:39:45

解决方案5
0 2017-08-08 14:54:01