简体   繁体   English

R-比较不同行中两列中的值

[英]R - Compare values in two columns in different rows

I have a dataframe df as seen below with two features, a departing city and an arrival city. 我有一个数据框df ,如下所示,具有两个功能,出发城市和到达城市。 Every two rows information is stored about a going and a return flight. 每两行存储一次往返航班信息。

  Departure Arrival
1    A          B
2    B          A
3    F          G
4    G          F
5    U          V
6    V          U
7    K          L
8    K          L

There is some inconsistency in the data where the same flight is repeated as it can be seen in the last two rows. 正如在最后两行中可以看到的那样,重复相同的飞行的数据中存在一些不一致之处。

How is it possible to compare for every two rows the departure city of the first row and the arrival city of the second row, and keep the ones that are equal. 如何每两行比较第一行的出发城市和第二行的到达城市,并保持相等。 The dataset is very big and of course a for-loop is not considered an option. 数据集非常大,当然不考虑使用for循环。

Thank you in advance. 先感谢您。

Here is a method that compares the pairs of rows using head and tail to line them up. 下面是比较对使用行的方法headtail到线起来。

# find Departures that match the Arrival in the next row
sames <- which(head(dat$Departure, -1) == tail(dat$Arrival, -1))
# keep pairs of rows that match, maintaining order with `sort`
dat[sort(unique(c(sames, (sames + 1)))),]
  Departure Arrival
1         A       B
2         B       A
3         F       G
4         G       F
5         U       V
6         V       U

Note that the two variables have to be character vectors, not factor variables. 请注意,这两个变量必须是字符向量,而不是因子变量。 you can coerce them to character using as.character if necessary. 您可以根据需要使用as.character来强制他们使用字符。

data 数据

dat <-
structure(list(Departure = c("A", "B", "F", "G", "U", "V", "K", 
"K"), Arrival = c("B", "A", "G", "F", "V", "U", "L", "L")), .Names = c("Departure", 
"Arrival"), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8"))

So you just want unique flight paths? 因此,您只想要独特的飞行路线吗? there are a number of ways to do this, I'd think the fastest would be with data.table, something like: 有很多方法可以做到这一点,我认为最快的方法是使用data.table,例如:

 library(data.table)
 df <- as.data.table(df)

 uniqueDf <- unique(df)

you can also use the duplicated function, something like 您还可以使用重复的功能,例如

 df <- df[!duplicated(df), ]

should do nicely. 应该做得很好。

You could also do it this way: 您也可以这样进行:

right = rep(df[c(T,F),"Arrival"]==df[c(F,T),"Departure"],each=2)
df[right,]

This returns: 返回:

   Departure Arrival
1          A       B
2          B       A
3          F       G
4          G       F
5          U       V
6          V       U

如果适合您,请尝试以下解决方案:

df[duplicated(paste0(df$Departure,df$Arrival))==F,]

This answer doesn't look for unique records, it specifically checks if a row is a duplicate of the row before. 此答案不是在查找唯一记录,而是专门检查某行是否与之前的行重复。

Adding a new column with a 1 if the row has repeated: 如果行已重复,则添加带有1的新列:

 for(i in 2:length(df$Departure)){df$test[i]=ifelse(df$Departure[i] == df$Departure[i-1] & df$Arrival[i] == df$Arrival[i-1], 1,0)}

Loops can be slow though: 循环可能很慢:

library(data.table)

df$test2 = ifelse(df$Departure == shift(df$Departure) & df$Arrival == shift(df$Arrival), 1,0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R - 根据不同行中的两列比较并保留第三列中的值 - R - Compare and keep values in the third column based on two columns in different rows R - 将列与两个不同数据帧的行进行比较 - R - compare columns with rows of two different data frames 最后3行的R平均值(不同列中的值)按两列分组 - R average of last 3 rows(values in different columns) grouping by two columns 如何根据R中的两个不同列添加行的值? - How to add values of rows depending on two different columns in R? 比较R中单独行中的不同列 - Compare different columns in separate rows in R 比较两列中的值并在 R 或 awk 中重新编码 - compare values in two columns and recode in R or awk R - 将两个不同长度的数据帧比较为两列中的相同值 - R - Compare two data frames of different length for same values in two columns 如何按不同的列比较 R 数据帧中的两行并对它们执行操作? - How can I compare two rows in R data frame by different columns and perform an operation on them? 比较来自两个不同列的字符串,其中每个单元格包含重复的值,并在R中对它们进行计数 - Compare strings from two different columns where each cell contains duplicated values and count them in R 使用 R 通过不同的行和列交叉链接值 - Crosslink values through different rows and columns with R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM