简体   繁体   English

如何对一系列多列应用单个条件以在

[英]How to apply a single condition on a sequence of multiple columns to create a single column in

I have a dataset which is similar to the following:我有一个类似于以下的数据集:

Age      Monday Tuesday Wednesday 
6-9        a     b        a
6-9        b     b        c
6-9              c        a
9-10       c     c        b
9-10       c     a        b

Using R, I want to a binary variable which represents whether the entire row contains "a" or not (1 as entire a and 0 as not) as the following:使用 R,我想要一个二进制变量,它表示整个行是否包含“a”(1 表示整个 a,0 表示不包含)如下:

Age      Monday Tuesday Wednesday  Entire a
6-9        a              a          1
6-9        b     b        c          0
6-9              c        a          0
9-10       c     c        b          0
9-10       a     a        a          1

Note: My data also contains missing values in the rows.注意:我的数据还包含行中的缺失值。 The columns which are my interest are of "Factors".我感兴趣的列是“因素”。 I use the following coding which however did not work:我使用以下编码但不起作用:

L <- dataframe %>%
    select(Age,Monday:Wednesday) %>%
    mutate (Entire a = ifelse(c(Monday:Wednesday)=="a",1,0,na.rm=TRUE))

I'd go with dplyr solution:我将 go 与 dplyr 解决方案:

library(dplyr)

my.data <- data.frame(
  age = c("6-9", "6-9", "6-9", "9-10", "9-10", "9-10"),
  Monday = c("a", "b", NA, "c", "a", "a"),
  Tuesday = c("a", "b", "a", "c", "a", NA),
  Wednesday = c("a", "c", "a", "c", "a", NA)
)

my.data %>%
  mutate(
    `Entire a` = apply(.[, 2:4], 1, function(x) all(x == "a", na.rm = T) %>% as.numeric)
  )

# age Monday Tuesday Wednesday Entire a
# 1  6-9      a       a         a        1
# 2  6-9      b       b         c        0
# 3  6-9   <NA>       a         a        1
# 4 9-10      c       c         c        0
# 5 9-10      a       a         a        1
# 6 9-10      a    <NA>      <NA>        1

The na.rm argument within all() function will control whether You will ignore missing values. all() function 中的na.rm参数将控制您是否忽略缺失值。

We could create a logical matrix with == and get the rowSums to convert to binary我们可以使用==创建一个逻辑矩阵,并将rowSums转换为binary

colnm <- names(dataframe)[-1]
dataframe$Entire_a <- +(rowSums(replace(dataframe[colnm], 
       dataframe[colnm] == '', 'a') == 'a')  == length(colnm))
dataframe$Entire_a
#[1] 1 0 0 0 1

Or another option is to paste and then use grep或者另一种选择是paste然后使用grep

+(grepl("^a+$", do.call(paste, c(dataframe[colnm], sep=""))))
#[1] 1 0 0 0 1

If the missing value is NA and not blank ( '' ), then use如果缺失值是NA并且不是空白 ( '' ),则使用

+(rowSums(replace(dataframe[colnm], is.na(dataframe[colnm]), 'a') == 'a')  == 3)

data数据

dataframe <- structure(list(Age = c("6-9", "6-9", "6-9", "9-10", "9-10"), 
    Monday = c("a", "b", "", "c", "a"), Tuesday = c("", "b", 
    "c", "c", "a"), Wednesday = c("a", "c", "a", "b", "a")), 
    row.names = c(NA, 
-5L), class = "data.frame")

We can use pmap_int from purrr for this row-wise operation.我们可以使用pmap_int中的purrr进行这种逐行操作。

Turn empty values ( '' ) to NA if they are not already.将空值 ( '' ) 转为NA如果它们还没有。

library(dplyr)
library(purrr)

dataframe %>%
    na_if('') %>%
    mutate(Entire_a = pmap_int(select(., Monday:Wednesday), 
                         ~+all(c(...) == 'a', na.rm = TRUE)))

#   Age Monday Tuesday Wednesday Entire_a
#1  6-9      a    <NA>         a        1
#2  6-9      b       b         c        0
#3  6-9   <NA>       c         a        0
#4 9-10      c       c         b        0
#5 9-10      a       a         a        1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM