[英]Fill a column with one of four date columns based on another R
I have a DF with 5 columns like so;我有一个像这样有 5 列的 DF;
A B Date1 Date2 Date3 Date4
1 x NA NA NA
2 NA y NA NA
3 NA NA z NA
4 NA NA NA f
I want to use the dplyr package and the case_when() function to state something like this我想使用 dplyr package 和 case_when() function 到 Z9ED39E2EA931586B76A 之类的东西
df <- df %>%
mutate(B = case_when(
A == 1 ~ B == Date1,
A == 2 ~ B == Date2,
A == 3 ~ B == Date3,
A == 4 ~ B == Date4))
Essentially based on the value of AI would like to fill B with one of 4 date columns.基本上基于 AI 的值想用 4 个日期列之一填充 B。
A is of class character, B and the Date are all class Date. A 是 class 字符,B 和日期都是 class 日期。
Problem is when I apply this to the dataframe it simply doesn't work.问题是当我将它应用于 dataframe 时,它根本不起作用。 It returns NAs and changes the class of B to boolean.它返回 NA 并将 B 的 class 更改为 boolean。 I am using R version 4.1.2.我正在使用 R 版本 4.1.2。 Any help is appreciated.任何帮助表示赞赏。
You can use coalesce()
to find first non-missing element.您可以使用coalesce()
来查找第一个非缺失元素。
library(dplyr)
df %>%
mutate(B = coalesce(!!!df[-1]))
# A Date1 Date2 Date3 Date4 B
# 1 1 x <NA> <NA> <NA> x
# 2 2 <NA> y <NA> <NA> y
# 3 3 <NA> <NA> z <NA> z
# 4 4 <NA> <NA> <NA> f f
The above code is just a shortcut of上面的代码只是一个快捷方式
df %>%
mutate(B = coalesce(Date1, Date2, Date3, Date4))
If the B
needs to be filled based on the value of A
, then here is an idea with c_across()
:如果B
需要根据A
的值填充,那么这里有一个c_across()
的想法:
df %>%
rowwise() %>%
mutate(B = c_across(starts_with("Date"))[A]) %>%
ungroup()
# # A tibble: 4 × 6
# A Date1 Date2 Date3 Date4 B
# <int> <chr> <chr> <chr> <chr> <chr>
# 1 1 x NA NA NA x
# 2 2 NA y NA NA y
# 3 3 NA NA z NA z
# 4 4 NA NA NA f f
As it seems, you want diagonal values from columns with Date
, you can use diag
:看起来,您想要来自Date
列的对角线值,您可以使用diag
:
df$B <- diag(as.matrix(df[grepl("Date", colnames(df))]))
#[1] "x" "y" "z" "f"
Other answers (if you want to coalesce):其他答案(如果你想合并):
max
: max
:df$B <- apply(df[2:5], 1, \(x) max(x, na.rm = T))
c_across
:使用c_across
:df %>%
rowwise() %>%
mutate(B = max(c_across(Date1:Date4), na.rm = T))
output output
A Date1 Date2 Date3 Date4 B
1 1 x <NA> <NA> <NA> x
2 2 <NA> y <NA> <NA> y
3 3 <NA> <NA> z <NA> z
4 4 <NA> <NA> <NA> f f
The other answers are superior, but if you must use your current code for the actual application, the corrected version is:其他答案更好,但如果您必须将当前代码用于实际应用程序,更正后的版本是:
df %>%
mutate(B = case_when(
A == 1 ~ Date1,
A == 2 ~ Date2,
A == 3 ~ Date3,
A == 4 ~ Date4))
Output: Output:
# A B Date1 Date2 Date3 Date4
# 1 x x <NA> <NA> <NA>
# 2 y <NA> y <NA> <NA>
# 3 z <NA> <NA> z <NA>
# 4 f <NA> <NA> <NA> f
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.