简体   繁体   中英

R: Remove rows from data frame based on values in several columns

I have the following dataframe (df) - there are more columns, but these are the relevant columns:

ID  Cost 
1    $100
1    $200
2    $50
2    $0
2    $40
3    $10
4    $100
5    $0
5    $50

I would like to subset this dataframe such that if any of the costs for a particular ID = $0, then it should remove all those rows (ie all the rows for that particular ID.)

Therefore, in this example, ID 2 and 5 contain a $0, so all of ID 2 and ID 5 rows should be deleted.

Here is the resulting df I would like:

    ID  Cost 
    1    $100
    1    $200
    3    $10
    4    $100

Could someone help with this? I tried some combinations of the subset function, but it didn't work.

** On a similar note: I have another dataframe with "NA"s - could you help me figure out the same problem, in case it were NAs, instead of 0's.

Thanks in advance!!

try this:

subset(df,!df$ID %in% df$ID[is.na(df$Cost) | df$Cost == "$0"])

this gives you:

  ID Cost
1  1 $100
2  1 $200
6  3  $10
7  4 $100

尝试

df[!df$ID %in% df$ID[df$Cost=="$0"],]

You can compute the IDs that you want to remove with something like tapply :

(has.zero <- tapply(df$Cost, df$ID, function(x) sum(x == 0) > 0))
#     1     2     3     4     5 
# FALSE  TRUE FALSE FALSE  TRUE 

Then you can subset, limiting to IDs that you don't want to remove:

df[!df$ID %in% names(has.zero)[has.zero],]
#   ID Cost
# 1  1  100
# 2  1  200
# 6  3   10
# 7  4  100

This is pretty flexible, because it enables you to limit IDs based on more complicated criteria (eg "the average cost for the ID must be at least xyz").

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM