[英]Create dummy based on character vectors in r
如果所有條目(在cols value_1_value_3中)都等於給定字符(例如“ C”),或者是NA,我想創建一個虛擬變量。
玩具示例:
df <- data.frame(state=rep("state"),
candidate=c("a","b","c"),
value_1= c("A","B","C"),
value_2= c("A","B",NA),
value_3= c("C",NA,NA), stringsAsFactors = FALSE)
Desiderata:
df <- data.frame(state=rep("state"),
candidate=c("a","b","c"),
value_1= c("A","B","C"),
value_2= c("A","B",NA),
value_3= c("C",NA,NA),
dummy=c(0,0,1),stringsAsFactors = FALSE)
我試過了(但不起作用):
df$dummy <- ifelse(df[-(1:2)] %in% c("C","NA"),1,0)
我們可以逐行apply
並檢查所選列中的all
條目是否all
等於"C"
,而忽略NA
值。
cols <- grep("^value", names(df))
df$dummy <- as.integer(apply(df[cols] == "C", 1, all, na.rm = TRUE))
df
# state candidate value_1 value_2 value_3 dummy
#1 state a A A C 0
#2 state b B B <NA> 0
#3 state c C <NA> <NA> 1
就您的嘗試而言, %in%
不適用於整個數據幀,您需要使用sapply
/ lapply
檢查多列中的值。 實際上,您可以在此處避免ifelse
df$dummy <- as.integer(sapply(df[-c(1:2)], function(x) all(x %in% c(NA, "C"))))
其他方式:
rowSums(df[-(1:2)] != "C", na.rm=TRUE) == 0
# [1] FALSE FALSE TRUE
這個怎么運作:
令人困惑的是, df[-(1:2)] == "C"
產生矩陣,而df[-(1:2)] %in% "C"
則不然。 要處理后者, as.matrix(df[-(1:2)])
將as.matrix(df[-(1:2)])
包裝。
使用tidyverse
的選項
library(tidyverse)
df %>%
mutate(dummy = pmap_int(select(., value_1, value_3),
~ +(!sum(c(...) != "C", na.rm = TRUE))))
# state candidate value_1 value_2 value_3 dummy
#1 state a A A C 0
#2 state b B B <NA> 0
#3 state c C <NA> <NA> 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.