简体   繁体   中英

Generate new column based on values of other columns in R

I have the following table:

|  | Red | Green | Blue | Yellow | Brown | Purple | Black |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Apple | A | B | D | D | C | F | E |
| Pear | A | B | C | B | C | F | B |
| Orange | A | B | C | B | C | F | B |
| Strawberry | A | C | D | D | C | F | D |
| Lemon | E | C | D | D | C | F | D |

Based on sample data similar to this:

INPUT DATA

ID Colour Fruit
1 Red Apple
2 Red Orange
3 Green Lemon
4 Brown Strawberry
...
1000 Brown Strawberry

I would like to generate an additional column (Group) in the input data which represents the values in the above table so that the output looks like this:

OUTPUT DATA

ID Colour Fruit Group
1 Red Apple A
2 Red Orange A
3 Green Lemon C
4 Brown Strawberry F
...
1000 Brown Strawberry F

I have seen this question: Generate new column values based on comparison of two other columns in R , which is an over-simplification of my example and uses ifelse() statements.

Is there another way to do this over thousands and rows and possible combinations of pairings that is not an extensive ifelse() statement?

The dplyr package has the mutate and filter functions but I'm not sure how to combine them in this example.

You should use the earlier method I suggested. Actually, excel type lookup is performed in R through dplyr joins

table <- data.frame(
  stringsAsFactors = FALSE,
                      Fruit = c("Apple",
                                "Pear","Orange","Strawberry","Lemon"),
               Red = c("A", "A", "A", "A", "E"),
             Green = c("B", "B", "B", "C", "C"),
              Blue = c("D", "C", "C", "D", "D"),
            Yellow = c("D", "B", "B", "D", "D"),
             Brown = c("C", "C", "C", "C", "C"),
            Purple = c('F', 'F', 'F', 'F', 'F'),
             Black = c("E", "B", "B", "D", "D")
         )
table
#>        Fruit Red Green Blue Yellow Brown Purple Black
#> 1      Apple   A     B    D      D     C      F     E
#> 2       Pear   A     B    C      B     C      F     B
#> 3     Orange   A     B    C      B     C      F     B
#> 4 Strawberry   A     C    D      D     C      F     D
#> 5      Lemon   E     C    D      D     C      F     D

colors <- c("Red", "Green", "Blue", "Yellow", "Brown", "Purple", "Black")
fruits <- c("Apple", "Pear", "Orange", "Strawberry", "Lemon")

set.seed(1)
input_data <- data.frame(ID = 1:1000,
                         Color = sample(colors, 1000, T),
                         Fruit = sample(fruits, 1000, T))

head(input_data)
#>   ID  Color  Fruit
#> 1  1    Red  Lemon
#> 2  2 Yellow Orange
#> 3  3  Black  Lemon
#> 4  4    Red  Apple
#> 5  5  Green   Pear
#> 6  6  Brown Orange
library(dplyr)
library(tidyr)

output <- input_data %>% left_join(table %>% pivot_longer(!Fruit, names_to = "Color", values_to = 'Code'))
#> Joining, by = c("Color", "Fruit")

head(output)
#>   ID  Color  Fruit Code
#> 1  1    Red  Lemon    E
#> 2  2 Yellow Orange    B
#> 3  3  Black  Lemon    D
#> 4  4    Red  Apple    A
#> 5  5  Green   Pear    B
#> 6  6  Brown Orange    C

tail(output)
#>        ID  Color  Fruit Code
#> 995   995   Blue Orange    C
#> 996   996    Red Orange    A
#> 997   997 Yellow   Pear    B
#> 998   998    Red  Apple    A
#> 999   999   Blue   Pear    C
#> 1000 1000 Purple  Apple    F

Created on 2021-04-30 by the reprex package (v2.0.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM