簡體   English   中英

如何創建一個新變量並為其分配與 R 中另一個變量對應的值?

[英]How to create a new variable and assign it a value corresponding to another variable in R?

這是與我正在使用的真實數據集相對應的一些模擬數據:

模擬數據集

    a <- c("a","b","c","d","e","f","g","h","i","j")
    b <- 1:10
    names <-c("Alex","Ale","Alexandra","Alexander","Ali","Amanda","Alix","Ajax","Aley","Ajay")
    data <- data.frame(a,b,names)

創建新的變量性別

    data <- data %>% 
      mutate(gender = NA)

我想為我的數據集中的names變量分配一個“性別”值。 我不想手動執行此操作,因為我正在處理 1000 次觀察。 但是,我確實有這些變量,其中包含對應於正確性別的“名稱”值:

male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")

但是我不知道如何使用它們來分配“性別”值以與我的數據集中的特定“名稱”相對應。

這是我嘗試過的:

data$gender[data$names== male] <- "Male"

和:

data$gender[data$names== c("Alex", "Ale", "Alexander")] <- "Male" 

此代碼不會將“男性”分配給所有值。 我收到一條警告消息:

"Warning message:
In data$names == c("Alex", "Ale", "Alexander") :
  longer object length is not a multiple of shorter object length"

有誰知道如何將值分配給與names變量相對應的gender變量?

我們可以創建一個命名list ,然后將其stack到一個兩列數據集,我們在連接中使用它

new <- stack(list(male = male, female = female, noanswer = noanswer))
names(new) <- c("names", "gender")
data <- data %>% 
    left_join(new, by = "names")

-輸出

data
   a  b     names   gender
1  a  1      Alex     male
2  b  2       Ale     male
3  c  3 Alexandra   female
4  d  4 Alexander     male
5  e  5       Ali   female
6  f  6    Amanda   female
7  g  7      Alix noanswer
8  h  8      Ajax noanswer
9  i  9      Aley noanswer
10 j 10      Ajay noanswer

關於 OP 的warning ,只是==是元素比較,這主要適用於當數據集 1 的length為 1(被回收)或與另一個length相同時。 這里, length s 是不同的。 因此,它會被回收,並且由於它不是其他向量長度的倍數,因此會發出警告。 但是,有時我們沒有收到警告,但它仍然是不正確的,因為它的作用類似於下面的那個。 如果第二個向量的長度為 3,而第一個向量的長度為 5

v1[1] == v2[1]
v1[2] == v2[2]
v1[3] == v2[3]
v1[4] == v2[1]
...

相反,我們可以使用%in%

data$gender[data$names %in% male] <- "Male"
data$gender[data$names %in% female] <- "Female"
data$gender[data$names %in% noanswer] <- "noanswer"

數據

data <- structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j"), b = 1:10, names = c("Alex", "Ale", "Alexandra", "Alexander", 
"Ali", "Amanda", "Alix", "Ajax", "Aley", "Ajay")),
  class = "data.frame", row.names = c(NA, 
-10L))

您還可以使用以下解決方案:

library(dplyr)

male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")

data %>%
  mutate(gender = case_when(
    names %in% male ~ "Male",
    names %in% female ~ "Female",
    names %in% noanswer ~ "Noanswer"
  ))

   a  b     names   gender
1  a  1      Alex     Male
2  b  2       Ale     Male
3  c  3 Alexandra   Female
4  d  4 Alexander     Male
5  e  5       Ali   Female
6  f  6    Amanda   Female
7  g  7      Alix Noanswer
8  h  8      Ajax Noanswer
9  i  9      Aley Noanswer
10 j 10      Ajay Noanswer

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM