繁体   English   中英

根据所有先前保留的数据组 R for loop 过滤行

[英]filter rows based on all previous keeping groups of data R for loop

我有一个简单的数据框,我想根据以前的数据过滤行,但通过给定变量保留数据组。

orig_df<-structure(list(New_ID = c("a", "a", "a", "b", "b", "b", "c", 
"c", "c", "d", "d", "d"), New_ID.1 = c("a", "b", "c", "b", "c", 
"d", "c", "d", "e", "d", "e", "f")), class = "data.frame", row.names = c(NA, 
-12L))

以下是当前代码,接近我认为的需要。

orig_df <- as.data.frame(orig_df)
included_rows <- rep(FALSE, nrow(orig_df))
seen_ids <- c()
for(i in 1:nrow(orig_df)){
  # Skip row if we have seen either ID already
  #if(orig_df[i, 'New_ID']   %in% seen_ids) next
  if(orig_df[i, 'New_ID.1'] %in% seen_ids) next
  # If both ids are new, we save them as seen and include the entry
  seen_ids <- c(seen_ids, orig_df[i, 'New_ID'] , orig_df[i, 'New_ID.1'] )
  included_rows[i] <-  TRUE
}
filtered_df <- orig_df[included_rows,]

我需要代码来过滤掉“b”和“c”,因为它们首先在 New.Id.1 中的“a”组中,这里的顺序很重要,我的桌子已经安排好了。 由于 New_ID 中的“d”不在 New_ID.1 的变量“a”中,并且b 和 c 已被过滤,因此应保留。 决赛桌应该是这样的:

structure(list(New_ID = c("a", "a", "a", "d", "d", "d"), New_ID.1 = c("a", 
"b", "c", "d", "e", "f")), class = "data.frame", row.names = c(NA, 
-6L))

a|a
a|b
a|c
d|d
d|e
d|f

希望这是有道理的!

谢谢!

我很确定这就是你想要的:

included_rows <- rep(FALSE, nrow(orig_df))
seen_ids <- c()
for(i in 1:nrow(orig_df)){
    # Skip row if first seen in other valid group
    if(orig_df[i, "New_ID"] %in% seen_ids) next
    # Add row to seen IDs if it is grouped with a different letter
    if (orig_df[i,"New_ID"] != orig_df[i,"New_ID.1"]) {
        seen_ids <- append(seen_ids, orig_df[i,"New_ID.1"])
    }
    included_rows[i] <- TRUE
}
filtered_df <- orig_df[included_rows,]

可能不是最有效的方法,但是当 New_ID 和 New_ID.1 不同时,这将过滤掉 New_ID 的第一次出现为 New_ID.1 的行,只有在该行还没有出现的情况下才会出现这种情况被过滤掉了。

你提到订单很重要,桌子已经安排好了。 它是否像您的示例一样按字母顺序排列? 如果是这样,您可以使用group_by function 并为每个 New_ID.1 找到 New_ID 的最小值。

library(tidyverse)
orig_df %>%
  group_by(New_ID.1) %>%
  summarise(New_ID.2 = min(New_ID))

您可能需要更改列名。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM