[英]How to apply a single condition on a sequence of multiple columns to create a single column in
I have a dataset which is similar to the following:我有一个类似于以下的数据集:
Age Monday Tuesday Wednesday
6-9 a b a
6-9 b b c
6-9 c a
9-10 c c b
9-10 c a b
Using R, I want to a binary variable which represents whether the entire row contains "a" or not (1 as entire a and 0 as not) as the following:使用 R,我想要一个二进制变量,它表示整个行是否包含“a”(1 表示整个 a,0 表示不包含)如下:
Age Monday Tuesday Wednesday Entire a
6-9 a a 1
6-9 b b c 0
6-9 c a 0
9-10 c c b 0
9-10 a a a 1
Note: My data also contains missing values in the rows.注意:我的数据还包含行中的缺失值。 The columns which are my interest are of "Factors".我感兴趣的列是“因素”。 I use the following coding which however did not work:我使用以下编码但不起作用:
L <- dataframe %>%
select(Age,Monday:Wednesday) %>%
mutate (Entire a = ifelse(c(Monday:Wednesday)=="a",1,0,na.rm=TRUE))
I'd go with dplyr solution:我将 go 与 dplyr 解决方案:
library(dplyr)
my.data <- data.frame(
age = c("6-9", "6-9", "6-9", "9-10", "9-10", "9-10"),
Monday = c("a", "b", NA, "c", "a", "a"),
Tuesday = c("a", "b", "a", "c", "a", NA),
Wednesday = c("a", "c", "a", "c", "a", NA)
)
my.data %>%
mutate(
`Entire a` = apply(.[, 2:4], 1, function(x) all(x == "a", na.rm = T) %>% as.numeric)
)
# age Monday Tuesday Wednesday Entire a
# 1 6-9 a a a 1
# 2 6-9 b b c 0
# 3 6-9 <NA> a a 1
# 4 9-10 c c c 0
# 5 9-10 a a a 1
# 6 9-10 a <NA> <NA> 1
The na.rm
argument within all()
function will control whether You will ignore missing values. all()
function 中的na.rm
参数将控制您是否忽略缺失值。
We could create a logical matrix with ==
and get the rowSums
to convert to binary
我们可以使用==
创建一个逻辑矩阵,并将rowSums
转换为binary
colnm <- names(dataframe)[-1]
dataframe$Entire_a <- +(rowSums(replace(dataframe[colnm],
dataframe[colnm] == '', 'a') == 'a') == length(colnm))
dataframe$Entire_a
#[1] 1 0 0 0 1
Or another option is to paste
and then use grep
或者另一种选择是paste
然后使用grep
+(grepl("^a+$", do.call(paste, c(dataframe[colnm], sep=""))))
#[1] 1 0 0 0 1
If the missing value is NA
and not blank ( ''
), then use如果缺失值是NA
并且不是空白 ( ''
),则使用
+(rowSums(replace(dataframe[colnm], is.na(dataframe[colnm]), 'a') == 'a') == 3)
dataframe <- structure(list(Age = c("6-9", "6-9", "6-9", "9-10", "9-10"),
Monday = c("a", "b", "", "c", "a"), Tuesday = c("", "b",
"c", "c", "a"), Wednesday = c("a", "c", "a", "b", "a")),
row.names = c(NA,
-5L), class = "data.frame")
We can use pmap_int
from purrr
for this row-wise operation.我们可以使用pmap_int
中的purrr
进行这种逐行操作。
Turn empty values ( ''
) to NA
if they are not already.将空值 ( ''
) 转为NA
如果它们还没有。
library(dplyr)
library(purrr)
dataframe %>%
na_if('') %>%
mutate(Entire_a = pmap_int(select(., Monday:Wednesday),
~+all(c(...) == 'a', na.rm = TRUE)))
# Age Monday Tuesday Wednesday Entire_a
#1 6-9 a <NA> a 1
#2 6-9 b b c 0
#3 6-9 <NA> c a 0
#4 9-10 c c b 0
#5 9-10 a a a 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.