[英]How to create a new variable and assign it a value corresponding to another variable in R?
這是與我正在使用的真實數據集相對應的一些模擬數據:
a <- c("a","b","c","d","e","f","g","h","i","j")
b <- 1:10
names <-c("Alex","Ale","Alexandra","Alexander","Ali","Amanda","Alix","Ajax","Aley","Ajay")
data <- data.frame(a,b,names)
data <- data %>%
mutate(gender = NA)
我想為我的數據集中的names
變量分配一個“性別”值。 我不想手動執行此操作,因為我正在處理 1000 次觀察。 但是,我確實有這些變量,其中包含對應於正確性別的“名稱”值:
male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
但是我不知道如何使用它們來分配“性別”值以與我的數據集中的特定“名稱”相對應。
這是我嘗試過的:
data$gender[data$names== male] <- "Male"
和:
data$gender[data$names== c("Alex", "Ale", "Alexander")] <- "Male"
此代碼不會將“男性”分配給所有值。 我收到一條警告消息:
"Warning message:
In data$names == c("Alex", "Ale", "Alexander") :
longer object length is not a multiple of shorter object length"
有誰知道如何將值分配給與names
變量相對應的gender
變量?
我們可以創建一個命名list
,然后將其stack
到一個兩列數據集,我們在連接中使用它
new <- stack(list(male = male, female = female, noanswer = noanswer))
names(new) <- c("names", "gender")
data <- data %>%
left_join(new, by = "names")
-輸出
data
a b names gender
1 a 1 Alex male
2 b 2 Ale male
3 c 3 Alexandra female
4 d 4 Alexander male
5 e 5 Ali female
6 f 6 Amanda female
7 g 7 Alix noanswer
8 h 8 Ajax noanswer
9 i 9 Aley noanswer
10 j 10 Ajay noanswer
關於 OP 的warning
,只是==
是元素比較,這主要適用於當數據集 1 的length
為 1(被回收)或與另一個length
相同時。 這里, length
s 是不同的。 因此,它會被回收,並且由於它不是其他向量長度的倍數,因此會發出警告。 但是,有時我們沒有收到警告,但它仍然是不正確的,因為它的作用類似於下面的那個。 如果第二個向量的長度為 3,而第一個向量的長度為 5
v1[1] == v2[1]
v1[2] == v2[2]
v1[3] == v2[3]
v1[4] == v2[1]
...
相反,我們可以使用%in%
data$gender[data$names %in% male] <- "Male"
data$gender[data$names %in% female] <- "Female"
data$gender[data$names %in% noanswer] <- "noanswer"
data <- structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j"), b = 1:10, names = c("Alex", "Ale", "Alexandra", "Alexander",
"Ali", "Amanda", "Alix", "Ajax", "Aley", "Ajay")),
class = "data.frame", row.names = c(NA,
-10L))
您還可以使用以下解決方案:
library(dplyr)
male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
data %>%
mutate(gender = case_when(
names %in% male ~ "Male",
names %in% female ~ "Female",
names %in% noanswer ~ "Noanswer"
))
a b names gender
1 a 1 Alex Male
2 b 2 Ale Male
3 c 3 Alexandra Female
4 d 4 Alexander Male
5 e 5 Ali Female
6 f 6 Amanda Female
7 g 7 Alix Noanswer
8 h 8 Ajax Noanswer
9 i 9 Aley Noanswer
10 j 10 Ajay Noanswer
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.