简体   繁体   English

从 r 中的数据框中删除选定的观察值

[英]Removing selected observations from a dataframe in r

I'm looking to remove 7 rows from a large dataset (>400 rows), based on the values in a certain column.我希望根据特定列中的值从大型数据集(> 400 行)中删除 7 行。 I am having issues with this simple endeavour.我在这个简单的努力中遇到了问题。

 ##Generate sample dataset
    Site.Num=c(1:20)
    Year=c(1990:2009)
    Day=c(10:29)
    Final<-data.frame(Site.Num,Year,Day)


 ##I would like to remove 5 rows, based on 5 sites from the Site.Num column
     Final <- Final[which(Final$Site.Num!=c(1,4,10,11,14)), ]


##I receive this error message
    Warning message:
        In Final$Site.Num != c(1, 4, 10, 11, 14) :
        longer object length is not a multiple of shorter object length

The warning is because you're using != to compare different vectors, and recycling will happen.警告是因为您正在使用!=来比较不同的向量,并且会发生回收。 However, this warning is important, because in this case, you're asking for a different value than you expect.但是,此警告很重要,因为在这种情况下,您要求的值与预期不同。

For example (using == for clarity) if you want to see which values of c(1,2,2) are contained in c(1,2) , consider this expression:例如(使用==为清晰起见)如果您想查看c(1,2)中包含c(1,2,2)哪些值,请考虑以下表达式:

> c(1,2,2) == c(1,2)
[1]  TRUE  TRUE FALSE
Warning message:
In c(1, 2, 2) == c(1, 2) :
  longer object length is not a multiple of shorter object length

but 2 is clearly in both vectors.2显然在两个向量中。 The FALSE value is because the vector on the right is being recycled, so these are the actual values compared: FALSE值是因为右边的向量正在被回收,所以这些是比较的实际值:

> c(1,2,2) == c(1,2,1)
[1]  TRUE  TRUE FALSE

However, in the former case, the vector on the right is not recycled an integral number of times.然而,在前一种情况下,右边的向量没有被循环整数次。 This usually means that you did something that you didn't expect.这通常意味着你做了一些你没有预料到的事情。 You want the operator %in% which gives set inclusion:您需要运算符%in%来提供集合包含:

> c(1,2,2) %in% c(1,2)
[1] TRUE TRUE TRUE

No warning, and the expected answer.没有警告,和预期的答案。

For your question, here is the command to get the desired rows:对于您的问题,这是获取所需行的命令:

Final <- Final[!(Final$Site.Num %in% c(1,4,10,11,14)), ]

Note that which doesn't help or hurt in this statement, unless the set of returned rows would be empty.请注意, which在此语句中无济于事,除非返回的行集为空。

使用 dplyr 包,你可以做这样的事情。

filter(Final, !Site.Num %in% c(1,4,10,11,14))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM