简体   繁体   English

如何使用 R 中另一个 dataframe 中的值匹配一个 dataframe 中的两列

[英]How to match two columns in one dataframe using values in another dataframe in R

I have two dataframes.我有两个数据框。 One is a set of ≈4000 entries that looks similar to this:一个是一组 ≈4000 个条目,看起来与此类似:

| grade_col1 | grade_col2 |
| --- | --- |
| A-| A-|
| B | 86|
| C+| C+|
| B-| D |
| A | A |
| C-| 72|
| F | 96|
| B+| B+|
| B | B |
| A-| A-|

The other is a set of ≈700 entries that look similar to this:另一个是一组 ≈700 个条目,看起来与此类似:

| grade | scale |
| --- | --- |
| A+|100|
| A+| 99|
| A+| 98|
| A+| 97|
| A | 96|
| A | 95|
| A | 94|
| A | 93|
| A-| 92|
| A-| 91|
| A-| 90|
| B+| 89|
| B+| 88|

...and so on. ...等等。

What I'm trying to do is create a new column that shows whether grade_col2 matches grade_col1 with a binary, 0-1 output (0 = no match, 1 = match).我要做的是创建一个新列,显示grade_col2 是否将grade_col1 与二进制0-1 output 匹配(0 = 不匹配,1 = 匹配)。 Most of grade_col2 is shown by letter grade. Grade_col2 的大部分以字母等级显示。 But every once in awhile an entry in grade_col2 was accidentally entered as a numeric grade instead.但是每隔一段时间,grade_col2 中的条目就会被意外输入为数字等级。 I want this match column to give me a "1" even when grade_col2 is a numeric grade instead of a letter grade.即使grade_col2 是数字等级而不是字母等级,我也希望此匹配列给我一个“1”。 In other words, if grade_col1 is B and grade_col2 is 86, I want this to still be read as a match.换句话说,如果grade_col1 是B 而grade_col2 是86,我希望这仍然被视为匹配。 Only when grade_col1 is F and grade_col2 is 96 would this not be a match (similar to when grade_col1 is B- and grade_col2 is D = not a match).仅当grade_col1 为F 且grade_col2 为96 时才会匹配(类似于grade_col1 为B- 且grade_col2 为D = 不匹配)。

The second data frame gives me the information I need to translate between one and the other (entries between 97-100 are A+, between 93-96 are A, and so on).第二个数据框给了我需要在一个和另一个之间转换的信息(97-100 之间的条目是 A+,93-96 之间是 A,等等)。 I just don't know how to run a script that uses this information to find matches through all ≈4000 entries.我只是不知道如何运行一个脚本,该脚本使用此信息在所有 ≈4000 个条目中查找匹配项。 Theoretically, I could do this manually, but the real dataset is so lengthy that this isn't realistic.理论上,我可以手动执行此操作,但实际数据集太长以至于不现实。

I had been thinking of using nested if_else statements with dplyr.我一直在考虑使用带有 dplyr 的嵌套 if_else 语句。 But once I got past the first "if" statement, I got stuck.但是一旦我通过了第一个“if”语句,我就卡住了。 I'd appreciate any help with this people can offer.我会很感激这个人能提供的任何帮助。

You can do this using a join.您可以使用联接来执行此操作。

Let your first dataframe be grades_df and your second dataframe be lookup_df , then you want something like the following:让你的第一个 dataframe 是grades_df和你的第二个 dataframe 是lookup_df ,那么你想要类似下面的东西:

output = grades_df %>%
  # join on look up, keeping everything grades table
  left_join(lookup_df, by = c(grade_col2 = "scale")) %>%
  # combine grade_col2 from grades_df and grade from lookup_df
  mutate(grade_col2b = ifelse(is.na(grade), grade_col2, grade)) %>%
  # indicator column
  mutate(indicator = ifelse(grade_col1 == grade_col2b, 1, 0))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查一个数据框中的两列是否都匹配另一个数据框中的两列? - How can I check that two columns in one dataframe both match two columns in another dataframe? 从数据框中删除与另一个数据框 R 中的两列匹配的行 - Remove rows from a dataframe that match two columns in another dataframe R 如何根据另一列的值聚合两列的 R dataframe - How to aggregate R dataframe of two columns based on values of another 匹配/子集一个 dataframe 基于另一个 dataframe 中的条件值在 R - Match/subset one dataframe based on conditional values in another dataframe in R 如何将一个 dataframe 中的列中的数字匹配到另一个 dataframe R 中的粗粒度值列 - How to match numbers in a column from one dataframe to a column of coarser grained values in another dataframe R R - 使用另一个 dataframe 更改 dataframe 的某些列中的值 - R - Change values in some columns of a dataframe using another dataframe 如何使用另一个R数据帧的值对一个R数据帧进行子集化? - How to subset one R dataframe with the values of another R dataframe? 如果多列匹配,R 从一个 dataframe 复制到另一个 - R copy from one dataframe to another if multiple columns match 如果前两列都匹配,则将数据框的一列中的值添加到另一数据框的新列中 - adding values from one column of a data frame into a new column of another dataframe if the first two columns in both match 如何使用 R 中另一个 dataframe 的插值更改一个 dataframe 中的值 - How to change values in one dataframe using interpolated values from another dataframe in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM