How to match two columns in one dataframe using values in another dataframe in R

Question

I have two dataframes. One is a set of ≈4000 entries that looks similar to this:

| grade_col1 | grade_col2 |
| --- | --- |
| A-| A-|
| B | 86|
| C+| C+|
| B-| D |
| A | A |
| C-| 72|
| F | 96|
| B+| B+|
| B | B |
| A-| A-|

The other is a set of ≈700 entries that look similar to this:

| grade | scale |
| --- | --- |
| A+|100|
| A+| 99|
| A+| 98|
| A+| 97|
| A | 96|
| A | 95|
| A | 94|
| A | 93|
| A-| 92|
| A-| 91|
| A-| 90|
| B+| 89|
| B+| 88|

...and so on.

What I'm trying to do is create a new column that shows whether grade_col2 matches grade_col1 with a binary, 0-1 output (0 = no match, 1 = match). Most of grade_col2 is shown by letter grade. But every once in awhile an entry in grade_col2 was accidentally entered as a numeric grade instead. I want this match column to give me a "1" even when grade_col2 is a numeric grade instead of a letter grade. In other words, if grade_col1 is B and grade_col2 is 86, I want this to still be read as a match. Only when grade_col1 is F and grade_col2 is 96 would this not be a match (similar to when grade_col1 is B- and grade_col2 is D = not a match).

The second data frame gives me the information I need to translate between one and the other (entries between 97-100 are A+, between 93-96 are A, and so on). I just don't know how to run a script that uses this information to find matches through all ≈4000 entries. Theoretically, I could do this manually, but the real dataset is so lengthy that this isn't realistic.

I had been thinking of using nested if_else statements with dplyr. But once I got past the first "if" statement, I got stuck. I'd appreciate any help with this people can offer.

Answer 1

You can do this using a join.

Let your first dataframe be grades_df and your second dataframe be lookup_df , then you want something like the following:

output = grades_df %>%
  # join on look up, keeping everything grades table
  left_join(lookup_df, by = c(grade_col2 = "scale")) %>%
  # combine grade_col2 from grades_df and grade from lookup_df
  mutate(grade_col2b = ifelse(is.na(grade), grade_col2, grade)) %>%
  # indicator column
  mutate(indicator = ifelse(grade_col1 == grade_col2b, 1, 0))

How to match two columns in one dataframe using values in another dataframe in R

Question

1 answers

solution1
0 2021-12-13 20:32:21

How to match two columns in one dataframe using values in another dataframe in R

Question

1 answers

solution1 0 2021-12-13 20:32:21

solution1
0 2021-12-13 20:32:21