简体   繁体   English

根据从不同列获得的值创建新列,使用 R 中的 mutate() 和 case_when 函数

[英]Creating a new column based on values obtained from different column, using mutate() and case_when function in R

I am a student relatively new to R and have learnt a lot from browsing here, I have been stuck on something recently which after hours of trying still haven't been able to figure out what to do.我是一个对 R 比较陌生的学生,并且从浏览这里学到了很多东西,我最近一直被困在一些事情上,经过几个小时的尝试仍然无法弄清楚该怎么做。 Let's propose the following data set:让我们提出以下数据集:

ID Y1 Y2 Y3 Y4 ID Y1 Y2 Y3 Y4

1 0 0 1 1 1 0 0 1 1

2 0 0 0 0 2 0 0 0 0

3 NA NA NA NA 3 不适用 不适用 不适用

I want to create a new column where it is filled based upon the following the conditions:我想创建一个新列,根据以下条件填充它:

  1. If the row contains 1, return 1 regardless of NA or 0如果该行包含 1,则无论 NA 还是 0,都返回 1
  2. If it contains a mix of 0 and NA but not 1, return 0如果它包含 0 和 NA 的混合但不包含 1,则返回 0
  3. If it only contains NA, return NA如果只包含 NA,则返回 NA

So using the example above I wanted to get the following:因此,使用上面的示例,我想得到以下信息:

ID Y1 Y2 Y3 Y4 Outcome ID Y1 Y2 Y3 Y4 结果

1 0 0 1 1 1 1 0 0 1 1 1

2 0 0 0 0 0 2 0 0 0 0 0

3 NA NA NA NA NA 3 NA NA NA NA NA

However, the code I tried:但是,我尝试的代码:

Data2 <- Data %>% mutate(Outcome = case_when( 
                                Data$Y1 == "na" &
                                Data$Y2 == "na" &
                                Data$Y3 == "na" &
                                Data$Y4 == "na" ~ "na"))  %>%                                
          mutate(Outcome = case_when(Data$Y1 == 1 ~ "1", 
                                 Data$Y2 == 1 ~ "1", 
                                 Data$Y3 == 1 ~ "1",
                                 Data$Y4 == 1 ~ "1",
                                 TRUE ~ "No"))

will return with:将返回:

ID Y1 Y2 Y3 Y4 Outcome ID Y1 Y2 Y3 Y4 结果

1 0 0 1 1 1 1 0 0 1 1 1

2 0 0 0 0 0 2 0 0 0 0 0

3 NA NA NA NA 0 3 不适用 不适用 不适用 0

which seems to ignore condition 3 where if it only contains na, return na.这似乎忽略了条件 3,如果它只包含 na,则返回 na。

Any pointers as to what I done wrong would be greatly appreciated.任何关于我做错了什么的指针将不胜感激。

Please forgive the formatting, I'm not sure how I could make it prettier as this is the first time I asked a question here.请原谅格式,我不确定如何使它更漂亮,因为这是我第一次在这里提出问题。

Many thanks in advance!提前谢谢了!

[Edit] Thanks to Shah I noticed that there is potential for confusion, for that I apologise. [编辑] 感谢 Shah,我注意到可能会造成混淆,对此我深表歉意。 I need give some clarification that this is just a segment of the data set to get the point across.我需要澄清一下,这只是数据集的一部分,以便理解这一点。 I'm dealing with a big dataset which contains more columns, some of which also have numeric values.我正在处理一个包含更多列的大数据集,其中一些也有数值。

Checking for each column ( Y1 , Y2 , Y3 etc) is too tedious and not scalable.检查每一列( Y1Y2Y3等)太乏味且不可扩展。 It becomes a big problem if you have 100 columns where you need this.如果你有 100 列需要它,这将成为一个大问题。

As showed in example you want to ignore the 1st column ( ID ) and include all other columns in the calculation you can do the following.如示例所示,您希望忽略第一列 ( ID ) 并在计算中包含所有其他列,您可以执行以下操作。 -1 in the answer is to ignore the 1st column ID .答案中的-1是忽略第一列ID

Also use is.na to compare the NA values.也可以使用is.na来比较NA值。

#Count number of non-NA values, this is used later to change the rows
#with all NA values to NA in outcome
non_NA <- rowSums(!is.na(df[-1]))
#Assign 1 if the count of 1 is greater than 0 in a row
df$Outcome <- as.integer(rowSums(df[-1], na.rm = TRUE) > 0)
#turn the outcome variable to NA for rows which has all NA values. 
df$Outcome[non_NA == 0] <- NA
df
#  ID Y1 Y2 Y3 Y4 Outcome
#1  1  0  0  1  1       1
#2  2  0  0  0  0       0
#3  3 NA NA NA NA      NA

data数据

df <- structure(list(ID = 1:3, Y1 = c(0L, 0L, NA), Y2 = c(0L, 0L, NA
), Y3 = c(1L, 0L, NA), Y4 = c(1L, 0L, NA)), 
class = "data.frame", row.names = c(NA, -3L))

You can try this using dplyr rowwise function which treat each row separately您可以使用dplyr rowwise函数尝试此操作,该函数分别处理每一行

library(dplyr)

df |> rowwise() |> 
mutate(Outcome = case_when(any(c_across(Y1:Y4) == 1) ~ "1" ,
 all(is.na(c_across(Y1:Y4))) ~ NA_character_ , TRUE ~ "0"))

  • output输出
# A tibble: 3 × 6
# Rowwise: 
     ID    Y1    Y2    Y3    Y4 Outcome
  <int> <int> <int> <int> <int> <chr>  
1     1     0     0     1     1 1      
2     2     0     0     0     0 0      
3     3    NA    NA    NA    NA NA     

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:基于字符串替换列值的有效方法(可能使用 case_when 或某种其他形式的 mutate)? - R: Efficient way to replace column values based on strings (maybe with case_when or some other form of mutate)? 从 R 的数据框中的现有列创建新的竞争变量(使用 case_when 函数) - Creating a New Race Variable from Existing Column in Data Frame in R (with case_when function) #R 如何根据从向量中获取的列名来改变 case_when - #R how to mutate case_when based on column name taken from vector 使用case_when在dplyr的mutate中根据条件在数据框中创建新列 - using case_when inside dplyr's mutate to create a new column in dataframe based on conditions 使用mutate和case_when时,从现有列中插入值 - Insert values from an existing column when using mutate and case_when 使用 R 中其他列的 case_when 添加新列 - Add new column using case_when of other column in R 使用 case_when() 和 filter() 根据 R 中一列中的值和另一列中的级别对数​​据框进行子集化 - using case_when() and filter() to subset a dataframe based on values in one column and levels in another column in R 跨列 case_when 变异以创建新的“标志”列 - Mutate across columns case_when to make a new "Flag" column 使用 case_when,如何改变嵌套向量的新列表列? - Using case_when, how to mutate a new list-column that nests a vector within? 在 R 中使用 mutate 和 case_when() 语句用 unite() 填充列,整洁的诗句 - Fill column with unite() using mutate and case_when() statement in R, tidy verse
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM