简体   繁体   English

从data.frames列表中删除data.frames中特定的重复观测值

[英]Remove specific duplicated observations in data.frames from a list of data.frames

I have a list of data.frames that looks like this: 我有一个data.frames列表,看起来像这样:

  $`42`
     Val     Replicate Index     
   26.92        R2    42
   26.92        R3    42
   28.68        R1    42
   28.68        R4    42

 $`43`
    Val      Replicate Index
  28.92        R3    43
  29.28        R2    43
  30.11        R1    43
  30.11        R4    43

 $`44`
    Val  Replicate Index
   24.67       R3    44
   24.70       R2    44
   25.70       R1    44
   25.70       R4    44   

 $`45`
    Val  Replicate Index
  30.57       R1    45
  30.57       R4    45
  32.39       R2    45
  32.81       R3    45

What I would like to do is the following: if in the column "Val" there's a duplicated element with respect to R4 in column "Replicate", they both will be removed from column "Val". 我要执行的操作如下:如果“ Val”列中的“ Replicate”列中存在与R4有关的重复元素,则它们都将从“ Val”列中删除。

For example, in the data.frame named 45 , since 30.57 (R1) is equal to 30.57 (R4), they both will be removed retaining only 32.39 (R2) and 32.81 (R3). 例如,在名为45的data.frame中,由于30.57(R1)等于30.57(R4),它们都将被删除,仅保留32.39(R2)和32.81(R3)。 So the desired output for data.frame 45 would be: 因此,data.frame 45的期望输出为:

 $`45`
    Val  Replicate Index
  32.39       R2    45
  32.81       R3    45      

I tried to use: 我尝试使用:

lapply(mydf, function(x) x[!duplicated(x[c("Val")]), ])    

but unfortunately it removes all duplicated elements in the column "Val", not with respect to the comparison with R4 in column "Replicate". 但不幸的是,它删除了“ Val”列中的所有重复元素,而不是与“ Replicate”列中与R4的比较而言。

If this is only related to "R4" values, then 如果这仅与“ R4”值有关,则

df[!((duplicated(df$Val) |  duplicated(df$Val, fromLast=TRUE)) &
   df$Val[df$Replicate == "R4"]), ]

will keep the non-duplicate observations as well as all non-R4 observations for some data.frame df. 将保留某些数据的非重复观测值以及所有非R4观测值。 Then, to put this into lapply 然后,把它放进lapply

mynonDupeList <- lapply(myList,
                    function(i) i[!((duplicated(df$Val) | duplicated(i$Val, fromLast=TRUE))
                                  & i$Val[i$Replicate == "R4"]), ]))

should do the trick. 应该可以。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM