简体   繁体   English

根据列表中的值列表对 dataframe 进行子集

[英]Subset a dataframe based on a list of values in a list

I have a dataframe with participants' IDs and observations.我有一个带有参与者 ID 和观察结果的 dataframe。 I also have a list of the IDs of some participants that need to be removed from this data frame - I want to remove the whole row associated with this participant ID.我还有一个需要从该数据框中删除的一些参与者的 ID 列表 - 我想删除与该参与者 ID 关联的整行。 I have tried the following:我尝试了以下方法:

ListtoRemove <- as.list(ListtoRemove)
NewDataFrame <-    
subset(OldDataFrame,OldDataFrame$ParticipantsIDs!=ListtoRemove)

This gives two warnings and does not remove the rows.这会给出两个警告并且不会删除行。

1: In `!=.default`(DemographicsALL$subject_label, AllSibs) :
longer object length is not a multiple of shorter object length
2: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
> 

Example of the data:数据示例:

structure(list(ParticipantsIDs = structure(c(2L, 1L, 3L, 4L, 
6L, 5L), .Label = c("B0002", "B001", "B003", "B004", "L004", 
"M003"), class = "factor"), Age = structure(c(3L, 1L, 4L, 2L, 
5L, 6L), .Label = c("15", "23", "45", "53", "65", "98"), class =      
"factor")), class = "data.frame", row.names = c(NA, 
-6L))

ListtoRemove <- as.list(B004,M003)
NewDataFrame[ !NewDataFrame[,1] %in% unlist(ListtoRemove), ]
#      ParticipantsIDs Age 
# [1,] "B001"          "45"
# [2,] "B0002"         "15"
# [3,] "B003"          "53"
# [4,] "L004"          "98"

I think there might be some mistakes in the code you provided.我认为您提供的代码中可能存在一些错误。

  1. You use subset in a way that suggests NewDataFrame is a data.frame , but you gave us a matrix .您以一种暗示NewDataFramedata.frame的方式使用subset ,但您给了我们一个matrix My code works either way, but your subset is going to fail (in a different way than you showed).无论哪种方式,我的代码都可以工作,但是您的subset将会失败(以与您展示的方式不同的方式)。
  2. as.list(B004, M003) is perhaps wrong on up to three points: as.list(B004, M003)在三点上可能是错误的:

    • if those are names of variables, then we do not have them;如果这些是变量的名称,那么我们没有它们;
    • if those are strings, then we see如果这些是字符串,那么我们看到

      as.list(B004, M003) # Error in as.list(B004, M003): object 'B004' not found
    • as.list(1, 2, 3) only list -ifies the first argument, here 2 and 3 are ignored (so we would have only seen "B004" , not M003 ; perhaps you meant list("B004", "M003") or c("B004", "M003") ? as.list(1, 2, 3)list第一个参数,这里 2 和 3 被忽略(所以我们只会看到"B004" ,而不是M003 ;也许你的意思是list("B004", "M003")c("B004", "M003")

Instead, I used相反,我用

ListtoRemove <- list("B004","M003")

If you're using a data frame, an easier-to-read way to do it would be:如果您使用的是数据框,那么更易于阅读的方法是:

# create data.frame
df <- data.frame(ParticipantsIDs = c("B001", "B0002", "B003", "B004", "M003", "L004"), 
                        Age = c("45", "15", "53", "23", "65", "98"))

# vector containing ids to remove
ids.remove <- c('B004','M003')

df

# subset df by rows where ParticipantsIDs are not found in ids.remove
subset(df, !(ParticipantsIDs %in% ids.remove))

Using your data (ListtoRemove slightly edited - I hope this is correct):使用您的数据(ListtoRemove 稍作编辑 - 我希望这是正确的):

data=structure(c("B001", "B0002", "B003", "B004", "M003", "L004", 
"45", "15", "53", "23", "65", "98"), .Dim = c(6L, 2L), .Dimnames = list(
NULL, c("ParticipantsIDs", "Age")))
ListtoRemove <- list("B004","M003")

What about:关于什么:

data_subset=data[!data[,"ParticipantsIDs"] %in% unlist(ListtoRemove),]

Output: Output:

> data_subset
     ParticipantsIDs Age 
[1,] "B001"          "45"
[2,] "B0002"         "15"
[3,] "B003"          "53"
[4,] "L004"          "98"

I ended up using:我最终使用:

data_subset = data[!data[, "ParticipantsIDs"] %in% unlist(ListtoRemove), ]

and it worked well.它运作良好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM