简体   繁体   中英

How to create a new variable and assign it a value corresponding to another variable in R?

Here is some mock data corresponding to the real dataset I am using:

mock dataset

    a <- c("a","b","c","d","e","f","g","h","i","j")
    b <- 1:10
    names <-c("Alex","Ale","Alexandra","Alexander","Ali","Amanda","Alix","Ajax","Aley","Ajay")
    data <- data.frame(a,b,names)

create new variable gender

    data <- data %>% 
      mutate(gender = NA)

I want to assign a "gender" value to the names variable in my dataset. I don't want to do this manually because I am dealing with 1000s of observations. I do however have these variables, which contain the "names" value corresponding to the right gender:

male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")

However I don't know how to use them to assign a "gender" value to correspond with specific "names" in my dataset.

Here is what I tried:

data$gender[data$names== male] <- "Male"

And:

data$gender[data$names== c("Alex", "Ale", "Alexander")] <- "Male" 

This code does not assign "Male" to all of the values. I recieve a warning message:

"Warning message:
In data$names == c("Alex", "Ale", "Alexander") :
  longer object length is not a multiple of shorter object length"

Does anyone know how I can assign values to my gender variable corresponding to the names variable?

We can create a named list and then stack it to a two column dataset, which we use in a join

new <- stack(list(male = male, female = female, noanswer = noanswer))
names(new) <- c("names", "gender")
data <- data %>% 
    left_join(new, by = "names")

-output

data
   a  b     names   gender
1  a  1      Alex     male
2  b  2       Ale     male
3  c  3 Alexandra   female
4  d  4 Alexander     male
5  e  5       Ali   female
6  f  6    Amanda   female
7  g  7      Alix noanswer
8  h  8      Ajax noanswer
9  i  9      Aley noanswer
10 j 10      Ajay noanswer

Regarding the OP's warning , it is just that == is elementwise comparison and that is applicable mostly when the length of 1 of the datasets is either 1 (which gets recycled) or be the same length as the other one. Here, the length s are different. So, it gets recycled and as it is not a multiple of the other vector length, there is warning. But, sometimes we don't get warning, but still it is incorrect because what it does is similar to the one below. If the second vector is of length 3 and first is 5

v1[1] == v2[1]
v1[2] == v2[2]
v1[3] == v2[3]
v1[4] == v2[1]
...

Instead, we may use %in%

data$gender[data$names %in% male] <- "Male"
data$gender[data$names %in% female] <- "Female"
data$gender[data$names %in% noanswer] <- "noanswer"

data

data <- structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j"), b = 1:10, names = c("Alex", "Ale", "Alexandra", "Alexander", 
"Ali", "Amanda", "Alix", "Ajax", "Aley", "Ajay")),
  class = "data.frame", row.names = c(NA, 
-10L))

You can also use the following solution:

library(dplyr)

male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")

data %>%
  mutate(gender = case_when(
    names %in% male ~ "Male",
    names %in% female ~ "Female",
    names %in% noanswer ~ "Noanswer"
  ))

   a  b     names   gender
1  a  1      Alex     Male
2  b  2       Ale     Male
3  c  3 Alexandra   Female
4  d  4 Alexander     Male
5  e  5       Ali   Female
6  f  6    Amanda   Female
7  g  7      Alix Noanswer
8  h  8      Ajax Noanswer
9  i  9      Aley Noanswer
10 j 10      Ajay Noanswer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM