简体   繁体   中英

Adding a new string column to a dataframe based on a previous numeric column in R

I have a dataframe of data for 400,000 trees of 6 different species. Each species is assigned a numeric species code that corresponds with a specific species. I would like to add another column listing the scientific name of each tree. The species codes are not consecutive, as this data was filtered down from 490,000 trees of 163 species based on abundance. Here is an example of data similar to what I have:

Index    Age    Species_code
0        45     14
1        47     32
2        14     62
3        78     126
4        40     14
5        38     17 
6        28     47

And here is an example of what I would like to get to:

Index    Age    Species_code    Species
0        45     14              Licania_heteromorpha
1        47     32              Pouteria_reticulata
2        14     62              Chrysophyllum_cuneifolium
3        78     126             Eperua_falcata
4        40     14              Licania_heteromorpha
5        38     17              Simaba_cedron
6        28     47              Sterculia_pruriens

I have been trying things along the lines of

if (Species_code == 14)
{
}

However, this gives me TRUE or FALSE in the output

One solution would be to use mutate with case_when if you know which numbers correspond to what Species, I have filled out some of them which gives the code to follow on:

library(tidyverse)
x <-"
  Index    Age    Species_code
0        45     14
1        47     32
2        14     62
3        78     126
4        40     14
5        38     17 
6        28     47"
y <- read.table(text = x, header = TRUE)
y <- y %>% 
  mutate(species = case_when(Species_code == 14 ~ "Licania_heteromorpha",
                             Species_code == 32 ~ "Pouteria_reticulata",
                             Species_code == 62 ~"Chrysophyllum_cuneifolium"))   #etc...
y
#   Index Age Species_code                   species
# 1     0  45           14      Licania_heteromorpha
# 2     1  47           32       Pouteria_reticulata
# 3     2  14           62 Chrysophyllum_cuneifolium
# 4     3  78          126                      <NA>
# 5     4  40           14      Licania_heteromorpha
# 6     5  38           17                      <NA>
# 7     6  28           47                      <NA>

Although if you have a separate dataset of species and codes, it would make more sense to merge.

You may want to use the ifelse() function.

You may also want to use:

my_names <- numeric()
my_names[47] <- "Licania_heteromorpha"
my_names[63] <- "Chrysophyllum_cuneifolium"
...
df$Species <- names[df$Species_code]

You may yet also have a look at dplyr numerous functions for that, like case_when and recode . See: https://dplyr.tidyverse.org/reference .

As your problem have only 6 especies, you can do this:

df$Species = NULL

df$Species[df$Species_code == 14] = 'Licania_heteromorpha'
df$Species[df$Species_code == 32] = 'Pouteria_reticulata'
.....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM