I just started learning R and I really need some help with cleaning my data. I spent the last 2 days trying to find a solution but nothing seems to work.
I have a dataset called d.new
. Here is an example for the relevant rows:
d.new <- cbind(c("abc","abc","abc","def","def","def"),c("yes",NA,NA,"no",NA,NA))
colnames(d.new) <- c("observation", "vis")
I extracted the codes for vis == "yes"
like this:
idx_vis <- c(select(filter(d.new, vis == "yes"), c(observation)))
The output looks like this:
$observation
[1] "abc"
Now I'd like to find all rows, in which the content of my "observation" column is one of the codes in my vector (let's assume it's not just abc
but a few hundred codes) and delete them, but without actually hard coding everything . I'd like to use the script for other datasets with different codes, too.
So my desired output would be a dataframe that doesn't contain the rows with certain codes.
My attempt was to write a loop in which I go through all the rows and find and delete those, in which I found one of the codes from idx_vis
. I started like this (but I'm not even sure if this makes sense, I never wrote a loop before):
for(i in 1:length(d.new$observation)){
i2 <- c([i]:length(idx_vis))
idx_dump <- as.character(which(d.new$observation == "idx_vis[i2]"))
# then delete the rows from idx_dump from d.new?
}
It would be great if someone could give me a hint! Thanks in advance!
Merle
试试这个: d.new[d.new$vis == "yes", ]
根据“ vis”列中的值选择线。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.