[英]filter rows based on all previous keeping groups of data R for loop
我有一个简单的数据框,我想根据以前的数据过滤行,但通过给定变量保留数据组。
orig_df<-structure(list(New_ID = c("a", "a", "a", "b", "b", "b", "c",
"c", "c", "d", "d", "d"), New_ID.1 = c("a", "b", "c", "b", "c",
"d", "c", "d", "e", "d", "e", "f")), class = "data.frame", row.names = c(NA,
-12L))
以下是当前代码,接近我认为的需要。
orig_df <- as.data.frame(orig_df)
included_rows <- rep(FALSE, nrow(orig_df))
seen_ids <- c()
for(i in 1:nrow(orig_df)){
# Skip row if we have seen either ID already
#if(orig_df[i, 'New_ID'] %in% seen_ids) next
if(orig_df[i, 'New_ID.1'] %in% seen_ids) next
# If both ids are new, we save them as seen and include the entry
seen_ids <- c(seen_ids, orig_df[i, 'New_ID'] , orig_df[i, 'New_ID.1'] )
included_rows[i] <- TRUE
}
filtered_df <- orig_df[included_rows,]
我需要代码来过滤掉“b”和“c”,因为它们首先在 New.Id.1 中的“a”组中,这里的顺序很重要,我的桌子已经安排好了。 由于 New_ID 中的“d”不在 New_ID.1 的变量“a”中,并且b 和 c 已被过滤,因此应保留。 决赛桌应该是这样的:
structure(list(New_ID = c("a", "a", "a", "d", "d", "d"), New_ID.1 = c("a",
"b", "c", "d", "e", "f")), class = "data.frame", row.names = c(NA,
-6L))
a|a
a|b
a|c
d|d
d|e
d|f
希望这是有道理的!
谢谢!
我很确定这就是你想要的:
included_rows <- rep(FALSE, nrow(orig_df))
seen_ids <- c()
for(i in 1:nrow(orig_df)){
# Skip row if first seen in other valid group
if(orig_df[i, "New_ID"] %in% seen_ids) next
# Add row to seen IDs if it is grouped with a different letter
if (orig_df[i,"New_ID"] != orig_df[i,"New_ID.1"]) {
seen_ids <- append(seen_ids, orig_df[i,"New_ID.1"])
}
included_rows[i] <- TRUE
}
filtered_df <- orig_df[included_rows,]
可能不是最有效的方法,但是当 New_ID 和 New_ID.1 不同时,这将过滤掉 New_ID 的第一次出现为 New_ID.1 的行,只有在该行还没有出现的情况下才会出现这种情况被过滤掉了。
你提到订单很重要,桌子已经安排好了。 它是否像您的示例一样按字母顺序排列? 如果是这样,您可以使用group_by
function 并为每个 New_ID.1 找到 New_ID 的最小值。
library(tidyverse)
orig_df %>%
group_by(New_ID.1) %>%
summarise(New_ID.2 = min(New_ID))
您可能需要更改列名。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.