简体   繁体   English

用R中的ID用“ NA”替换值

[英]Replacing values with 'NA' by ID in R

I have data that looks like this 我有看起来像这样的数据

ID    v1    v2
1     1     0
2     0     1
3     1     0
3     0     1
4     0     1

I want to replace all values with 'NA' if the ID occurs more than once in the dataframe. 如果ID在数据框中出现多次,我想用“ NA”替换所有值。 The final product should look like this 最终产品应如下所示

ID    v1    v2
1     1     0
2     0     1
3     NA    NA
3     NA    NA
4     0     1

I could do this by hand, but I want R to detect all the duplicate cases (in this case two times ID '3') and replace the values with 'NA'. 我可以手动执行此操作,但是我希望R检测所有重复的情况(在这种情况下,两次ID为“ 3”),然后将值替换为“ NA”。

Thanks for your help! 谢谢你的帮助!

You could use duplicated() from either end, and then replace. 您可以在任一端使用duplicated() ,然后进行替换。

idx <- duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE)
df[idx, -1] <- NA

which gives 这使

  ID v1 v2 1 1 1 0 2 2 0 1 3 3 NA NA 4 3 NA NA 5 4 0 1 

This will also work if the duplicated IDs are not next to each other. 如果重复的ID不相邻,这也将起作用。

Data: 数据:

df <- structure(list(ID = c(1L, 2L, 3L, 3L, 4L), v1 = c(1L, 0L, 1L, 
0L, 0L), v2 = c(0L, 1L, 0L, 1L, 1L)), .Names = c("ID", "v1", 
"v2"), class = "data.frame", row.names = c(NA, -5L))

One more option: 另一种选择:

df1[df1$ID %in% df1$ID[duplicated(df1$ID)], -1] <- NA
#> df1
#  ID v1 v2
#1  1  1  0
#2  2  0  1
#3  3 NA NA
#4  3 NA NA
#5  4  0  1

data 数据

df1 <- structure(list(ID = c(1L, 2L, 3L, 3L, 4L), v1 = c(1L, 0L, 1L, 
0L, 0L), v2 = c(0L, 1L, 0L, 1L, 1L)), .Names = c("ID", "v1", 
"v2"), class = "data.frame", row.names = c(NA, -5L))

Here is a base R method 这是基本的R方法

# get list of repeated IDs
repeats <- rle(df$ID)$values[rle(df$ID)$lengths > 1]

# set the corresponding variables to NA
df[, -1] <- sapply(df[, -1], function(i) {i[df$ID %in% repeats] <- NA; i})

In the first line, we use rle to extract repeated IDs. 在第一行中,我们使用rle提取重复的ID。 In the second, we use sapply to loop through non-ID variables and replace IDs that repeat with NA for each variable. 在第二种方法中,我们使用sapply遍历非ID变量,并为每个变量替换以NA重复的ID。

Note that this assumes that the data set is sorted by ID. 请注意,这假设数据集按ID排序。 This may be accomplished with the order function. 这可以通过order功能来完成。 ( df <- df[order(df$ID),] ). df <- df[order(df$ID),] )。

If the dataset is very large, you might break up the first function into two steps to avoid computing the rle twice: 如果数据集非常大,则可以将第一个函数分为两个步骤,以避免两次计算rle

dfRle <- rle(df$ID)
repeats <- dfRle$values[dfRle$lengths > 1]

data 数据

df <- read.table(header=T, text="ID    v1    v2
1     1     0
2     0     1
3     1     0
3     0     1
4     0     1")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM