R: Remove rows from data frame based on values in several columns

Question

I have the following dataframe (df) - there are more columns, but these are the relevant columns:

I would like to subset this dataframe such that if any of the costs for a particular ID = $0, then it should remove all those rows (ie all the rows for that particular ID.)

Therefore, in this example, ID 2 and 5 contain a $0, so all of ID 2 and ID 5 rows should be deleted.

Here is the resulting df I would like:

Could someone help with this? I tried some combinations of the subset function, but it didn't work.

** On a similar note: I have another dataframe with "NA"s - could you help me figure out the same problem, in case it were NAs, instead of 0's.

Thanks in advance!!

Answer 1

try this:

subset(df,!df$ID %in% df$ID[is.na(df$Cost) | df$Cost == "$0"])

this gives you:

  ID Cost
1  1 $100
2  1 $200
6  3  $10
7  4 $100

Answer 2

尝试

df[!df$ID %in% df$ID[df$Cost=="$0"],]

Answer 3

You can compute the IDs that you want to remove with something like tapply :

(has.zero <- tapply(df$Cost, df$ID, function(x) sum(x == 0) > 0))
#     1     2     3     4     5 
# FALSE  TRUE FALSE FALSE  TRUE

Then you can subset, limiting to IDs that you don't want to remove:

df[!df$ID %in% names(has.zero)[has.zero],]
#   ID Cost
# 1  1  100
# 2  1  200
# 6  3   10
# 7  4  100

This is pretty flexible, because it enables you to limit IDs based on more complicated criteria (eg "the average cost for the ID must be at least xyz").

R: Remove rows from data frame based on values in several columns

Question

3 answers

solution1
4 ACCPTED 2015-06-09 16:58:55

solution2
3 2015-06-09 16:50:06

solution3
1 2015-06-09 16:53:48

R: Remove rows from data frame based on values in several columns

Question

3 answers

solution1 4 ACCPTED 2015-06-09 16:58:55

solution2 3 2015-06-09 16:50:06

solution3 1 2015-06-09 16:53:48

solution1
4 ACCPTED 2015-06-09 16:58:55

solution2
3 2015-06-09 16:50:06

solution3
1 2015-06-09 16:53:48