简体   繁体   中英

R: Removing rows from data frame based on external criteria

I have two data frames, df.1 and df.2 , and I'd like to remove rows from df.2 based on whether certain things about df.1 are true. Specifically, I want to delete all rows from df.2 where the df.1 value of feistiness corresponding to the date in df.2 has an NA value. How does one go about doing this? (I've looked at other questions and still couldn't figure this out.)

Reproducible code for the first data frame:

# create first data frame
dates <- rep(as.Date(5001:5010, origin = "1970-01-01"), times = 4)
dogs <- c(rep("Fido", times = 10), rep("Snoopy", times = 10), rep("Speckles", times = 10), rep("Pit", times = 10))
set.seed(200)
feistiness <- c(round(runif(35, min = 0, max = 100), digits = 0), rep(NA, times = 5))
df.1 <- data.frame(dates, dogs, feistiness)
names(df.1) <- c("date", "dog", "feistiness")

Which yields:

         date     dog feistiness
1  1983-09-11    Fido         56
2  1983-09-12    Fido         18
3  1983-09-13    Fido         97
4  1983-09-14    Fido         49
5  1983-09-15    Fido         49
6  1983-09-16    Fido         59
7  1983-09-17    Fido         72
8  1983-09-18    Fido         69
9  1983-09-19    Fido         18
10 1983-09-20    Fido         95
11 1983-09-11  Snoopy         69
12 1983-09-12  Snoopy         16
13 1983-09-13  Snoopy         58
14 1983-09-14  Snoopy         65
15 1983-09-15  Snoopy         83
16 1983-09-16  Snoopy          7
17 1983-09-17  Snoopy         12
18 1983-09-18  Snoopy         89
19 1983-09-19  Snoopy         56
20 1983-09-20  Snoopy         52
21 1983-09-11 Speckles         13
22 1983-09-12 Speckles         15
23 1983-09-13 Speckles         16
24 1983-09-14 Speckles         56
25 1983-09-15 Speckles         67
26 1983-09-16 Speckles         15
27 1983-09-17 Speckles         57
28 1983-09-18 Speckles         76
29 1983-09-19 Speckles         57
30 1983-09-20 Speckles         78
31 1983-09-11     Pit         68
32 1983-09-12     Pit         22
33 1983-09-13     Pit         28
34 1983-09-14     Pit          9
35 1983-09-15     Pit         59
36 1983-09-16     Pit         NA
37 1983-09-17     Pit         NA
38 1983-09-18     Pit         NA
39 1983-09-19     Pit         NA
40 1983-09-20     Pit         NA

And the second data frame:

# create second data frame
dates.2 <- as.Date(c(5002, 5005, 5004, 5009), origin = "1970-01-01")
dogs.2 <- c("Fido", "Snoopy", "Speckles", "Pit")
df.2 <- data.frame(dates.2, dogs.2)
names(df.2) <- c("date", "dog")

Which yields:

        date      dog
1 1983-09-12     Fido
2 1983-09-15   Snoopy
3 1983-09-14 Speckles
4 1983-09-19      Pit

The final output data frame should look the following, with the last row removed because the feistiness value for Pitt at 1983-09-19 is NA :

        date      dog
1 1983-09-12     Fido
2 1983-09-15   Snoopy
3 1983-09-14 Speckles

We can use anti_join from dplyr . df_final is the final output.

library(dplyr)

df_final <- df.2 %>%
  anti_join(df.1 %>% filter(is.na(feistiness)), by = c("date", "dog"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM