This is my first question on stack overflow, so please let me know if there is further information that would be required to answer the question. I have started learning R very recently so I kindly ask for your patience.
I have a data frame Df1 which I want to subset/filter based on two simultaneous conditions:
I have tried the following code
Sub <- subset(Df1, Df1$CompanyCode %in% Df2$CompanyCode & year(Df1$Date) %in% Df2$Year)
I think I know where the problem is but I don't know how to fix it. I think the formula above checks individually both "%in%" conditions and therefore returns to many cases.
To give a concrete example (see below; EDIT: as requested now as dput output ): it would expect to not have row #4 in Df1 in my result because there is no matching case in Df2. However, it is part of the resulting subset. I guess because it can find a match for both company code and the date individually , ie it can find company "B" in Df" and it can find the year 2016 in Df2. However, this is not what I want because there is no perfect match, having these two conditions fulfilled at the same time .
Df1 (Input1):
structure(list(CompanyCode = c("A", "A", "B", "B", "C", "D"),
Date = structure(c(16800, 17166, 16800, 17166, 16800, 17166
), class = "Date")), row.names = c(NA, -6L), class = "data.frame")
Df2 (Input 2):
structure(list(CompanyCode = c("A", "A", "B", "C", "D"), Year = c(2015L,
2016L, 2015L, 2015L, 2016L)), class = "data.frame", row.names = c(NA,
-5L))
Sub (Actual Output):
structure(list(CompanyCode = c("A", "A", "B", "B", "C", "D"),
Date = structure(c(16800, 17166, 16800, 17166, 16800, 17166
), class = "Date")), row.names = c(NA, 6L), class = "data.frame")
ExpectedSub (Expected Output):
structure(list(CompanyCode = c("A", "A", "B", "C", "D"), Date = structure(c(16800,
17166, 16800, 16800, 17166), class = "Date")), row.names = c(NA,
-5L), class = "data.frame")
I would greatly appreciate if you could help me out here. Hopefully this example made my problem clear.
Many thanks in advance!
one more way..
library(dplyr)
library(lubridate)
df1 %>% mutate(Year = year(as.Date(Date))) %>%
right_join(df2, by = c("CompanyCode" = "CompanyCode", "Year" = "Year"))
CompanyCode Date Year
1 A 2015-12-31 2015
2 A 2016-12-31 2016
3 B 2015-12-31 2015
4 C 2015-12-31 2015
5 D 2016-12-31 2016
You can paste
CompanyCode
and year value to create an unique key between and use %in%
to keep only those keys which are present df2
.
result <- subset(df1, paste(CompanyCode, format(Date, '%Y')) %in%
paste(df2$CompanyCode, df2$Year))
result
# CompanyCode Date
#1 A 2015-12-31
#2 A 2016-12-31
#3 B 2015-12-31
#5 C 2015-12-31
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.