简体   繁体   English

将特定值保留在 dataframe 中并删除所有其他值

[英]Keep specific values in a dataframe and delete all the others

Starting from a dataframe like this:从这样的 dataframe 开始:

  col1 <- c("Anne", "Emma", "Katy", "Albert", "Richard")
  col2 <- c("Albert", "Mark", "Mike", "Loren", "Anne")
  col3 <- c("Mark", "Emma", "Paul", "George", "Samuel" )
  
  df <- cbind(col1, col2, col3) 

I would like to keep only the values reported in this vector:我只想保留此向量中报告的值:

selected <- c("Emma", "Katy", "Mark")

and delete all the others, in order to have a new dataframe like this:并删除所有其他的,以便有一个新的 dataframe,如下所示:

col1    col2    col3
NA      NA      "Mark"
"Emma"  "Mark"  "Emma"
"Katy"  NA      NA
NA      NA      NA
NA      NA      NA

I have tried with the following code and it works:我尝试使用以下代码并且它有效:

df[df != "Emma" & df != "Katy" & df != "Mark"] <- NA

but I would like to find a way to use the vector selected in an if statement, instead of writing all the conditions manually.但我想找到一种方法来使用在 if 语句中selected的向量,而不是手动编写所有条件。 Indeed, my actual dataframe and vector of values are bigger than the ones in this example.事实上,我的实际 dataframe 和值向量比这个例子中的要大。

Thanks in advance for your help!在此先感谢您的帮助!

The code in the question creates a matrix with cbind , not a data.frame.问题中的代码使用cbind创建矩阵,而不是 data.frame。 This is important because df's are lists of vectors all of the same length with a dim attribute set whereas matrices are a folded vector, a vector with a dim attribute set.这很重要,因为 df 是所有长度相同且具有 dim 属性集的向量列表,而矩阵是折叠向量,具有 dim 属性集的向量。

  • For data.frames, use a loop over its columns, applying function '%in%' to each of them;对于 data.frames,在其列上使用循环,将 function '%in%'应用于每个列;
  • For matrices, there's no need for a loop.对于矩阵,不需要循环。
col1 <- c("Anne", "Emma", "Katy", "Albert", "Richard")
col2 <- c("Albert", "Mark", "Mike", "Loren", "Anne")
col3 <- c("Mark", "Emma", "Paul", "George", "Samuel" )

mat <- cbind(col1, col2, col3) 
df <- data.frame(col1, col2, col3) 

selected <- c("Emma", "Katy", "Mark")

is.na(df) <- !sapply(df, `%in%`, selected)
df
#>   col1 col2 col3
#> 1 <NA> <NA> Mark
#> 2 Emma Mark Emma
#> 3 Katy <NA> <NA>
#> 4 <NA> <NA> <NA>
#> 5 <NA> <NA> <NA>

is.na(mat) <- !mat %in% selected
mat
#>      col1   col2   col3  
#> [1,] NA     NA     "Mark"
#> [2,] "Emma" "Mark" "Emma"
#> [3,] "Katy" NA     NA    
#> [4,] NA     NA     NA    
#> [5,] NA     NA     NA

Created on 2022-03-20 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-03-20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM