I have the following table:
| | Red | Green | Blue | Yellow | Brown | Purple | Black |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Apple | A | B | D | D | C | F | E |
| Pear | A | B | C | B | C | F | B |
| Orange | A | B | C | B | C | F | B |
| Strawberry | A | C | D | D | C | F | D |
| Lemon | E | C | D | D | C | F | D |
Based on sample data similar to this:
INPUT DATA
ID Colour Fruit
1 Red Apple
2 Red Orange
3 Green Lemon
4 Brown Strawberry
...
1000 Brown Strawberry
I would like to generate an additional column (Group) in the input data which represents the values in the above table so that the output looks like this:
OUTPUT DATA
ID Colour Fruit Group
1 Red Apple A
2 Red Orange A
3 Green Lemon C
4 Brown Strawberry F
...
1000 Brown Strawberry F
I have seen this question: Generate new column values based on comparison of two other columns in R , which is an over-simplification of my example and uses ifelse() statements.
Is there another way to do this over thousands and rows and possible combinations of pairings that is not an extensive ifelse() statement?
The dplyr package has the mutate and filter functions but I'm not sure how to combine them in this example.
You should use the earlier method I suggested. Actually, excel type lookup is performed in R through dplyr
joins
table <- data.frame(
stringsAsFactors = FALSE,
Fruit = c("Apple",
"Pear","Orange","Strawberry","Lemon"),
Red = c("A", "A", "A", "A", "E"),
Green = c("B", "B", "B", "C", "C"),
Blue = c("D", "C", "C", "D", "D"),
Yellow = c("D", "B", "B", "D", "D"),
Brown = c("C", "C", "C", "C", "C"),
Purple = c('F', 'F', 'F', 'F', 'F'),
Black = c("E", "B", "B", "D", "D")
)
table
#> Fruit Red Green Blue Yellow Brown Purple Black
#> 1 Apple A B D D C F E
#> 2 Pear A B C B C F B
#> 3 Orange A B C B C F B
#> 4 Strawberry A C D D C F D
#> 5 Lemon E C D D C F D
colors <- c("Red", "Green", "Blue", "Yellow", "Brown", "Purple", "Black")
fruits <- c("Apple", "Pear", "Orange", "Strawberry", "Lemon")
set.seed(1)
input_data <- data.frame(ID = 1:1000,
Color = sample(colors, 1000, T),
Fruit = sample(fruits, 1000, T))
head(input_data)
#> ID Color Fruit
#> 1 1 Red Lemon
#> 2 2 Yellow Orange
#> 3 3 Black Lemon
#> 4 4 Red Apple
#> 5 5 Green Pear
#> 6 6 Brown Orange
library(dplyr)
library(tidyr)
output <- input_data %>% left_join(table %>% pivot_longer(!Fruit, names_to = "Color", values_to = 'Code'))
#> Joining, by = c("Color", "Fruit")
head(output)
#> ID Color Fruit Code
#> 1 1 Red Lemon E
#> 2 2 Yellow Orange B
#> 3 3 Black Lemon D
#> 4 4 Red Apple A
#> 5 5 Green Pear B
#> 6 6 Brown Orange C
tail(output)
#> ID Color Fruit Code
#> 995 995 Blue Orange C
#> 996 996 Red Orange A
#> 997 997 Yellow Pear B
#> 998 998 Red Apple A
#> 999 999 Blue Pear C
#> 1000 1000 Purple Apple F
Created on 2021-04-30 by the reprex package (v2.0.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.