[英]How to use ifelse with str_detect across multiple columns
我有一個 dataframe 顯示死者(死者)的 ICD-10 代碼。 數據框中的每一行對應一個死者,每個死者都可以有多達 20 個條件作為導致他或她死亡的因素列出。 我想創建一個新列,顯示死者是否有任何 ICD-10 糖尿病代碼(1 表示是,0 表示否)。 糖尿病代碼在 E10-E14 范圍內,即糖尿病代碼必須以以下向量中的任何字符串開頭,但第四個 position 可以采用不同的值:
diabetes <- c("E10","E11","E12","E13","E14")
這是一個小的、虛構的數據示例:
original <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255",
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
acond1 | 第二個 | acond3 | acond4 |
---|---|---|---|
E112 | I255 | I258 | I500 |
I250 | B341 | B348 | E669 |
A419 | F179 | I10 | I694 |
E149 | F101 | I10 | R092 |
這是我想要的結果:
acond1 | 第二個 | acond3 | acond4 | 糖尿病 |
---|---|---|---|---|
E112 | I255 | I258 | I500 | 1 |
I250 | B341 | B348 | E669 | 0 |
A419 | F179 | I10 | I694 | 0 |
E149 | F101 | I10 | R092 | 1 |
還有一些其他帖子(例如, 在 dataframe 上使用 if else 跨多個列, Str_detect 多個列使用 cross )關於此類問題,但我似乎無法將它們放在一起。 到目前為止,這是我沒有成功的嘗試:
library(tidyverse)
library(stringr)
#attempt 1
original %>%
mutate_at(vars(contains("acond")), ifelse(str_detect(.,paste0("^(",
paste(diabetes, collapse = "|"), ")")), 1, 0))
#attempt 2
original %>%
unite(col = "all_conditions", starts_with("acond"), sep = ", ", remove = FALSE) %>%
mutate(diabetes = if_else(str_detect(.,paste0("^(", paste(diabetes, collapse = "|"), ")")), 1, 0))
任何幫助,將不勝感激。
這是使用apply
的基本 R方法
dia <- paste(c("E10","E11","E12","E13","E14"), collapse="|")
df$diabetes <- apply(df, 1, function(x) any(grepl(dia,x)))*1
df
acond1 acond2 acond3 acond4 diabetes
1 E112 I255 I258 I500 1
2 I250 B341 B348 E669 0
3 A419 F179 I10 I694 0
4 E149 F101 I10 R092 1
帶dplyr
library(dplyr)
df %>%
rowwise() %>%
mutate(diabetes=any(grepl(dia,c_across(starts_with("ac"))))*1) %>%
ungroup
# A tibble: 4 × 5
acond1 acond2 acond3 acond4 diabetes
<chr> <chr> <chr> <chr> <dbl>
1 E112 I255 I258 I500 1
2 I250 B341 B348 E669 0
3 A419 F179 I10 I694 0
4 E149 F101 I10 R092 1
df <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255",
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), class = "data.frame", row.names = c(NA,
-4L))
如果我們想across
wit ifelse
和str_detect
使用,那么我們可以:
str_detect
創建一個帶有paste
和collapse
的模式across
所有列進行mutate
,並使用帶有條件的匿名~ifelse
和.names
來控制新列unite
新列parse_number
的readr
技巧diabetes <- c("E10","E11","E12","E13","E14")
pattern <- paste(diabetes, collapse = "|")
library(tidyverse)
original %>%
mutate(across(everything(), ~ifelse(str_detect(., pattern), 1, 0), .names = "new_{col}")) %>%
unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(diabetes = parse_number(New_Col), .keep="unused")
acond1 acond2 acond3 acond4 diabetes
<chr> <chr> <chr> <chr> <dbl>
1 E112 I255 I258 I500 1
2 I250 B341 B348 E669 0
3 A419 F179 I10 I694 0
4 E149 F101 I10 R092 1
library(tidyverse)
diabetes_pattern <- c("E10","E11","E12","E13","E14") %>%
str_c(collapse = "|")
original <-
structure(
list(
acond1 = c("E112", "I250", "A419", "E149"),
acond2 = c("I255", "B341", "F179", "F101"),
acond3 = c("I258", "B348", "I10", "I10"),
acond4 = c("I500", "E669", "I694", "R092")
),
row.names = c(NA,-4L),
class = c("tbl_df", "tbl", "data.frame")
)
original %>%
rowwise() %>%
mutate(diabetes = +any(str_detect(string = c_across(everything()), pattern = diabetes_pattern)))
#> # A tibble: 4 x 5
#> # Rowwise:
#> acond1 acond2 acond3 acond4 diabetes
#> <chr> <chr> <chr> <chr> <int>
#> 1 E112 I255 I258 I500 1
#> 2 I250 B341 B348 E669 0
#> 3 A419 F179 I10 I694 0
#> 4 E149 F101 I10 R092 1
original %>%
mutate(diabetes = rowSums(across(.cols = everything(), ~str_detect(.x, diabetes_pattern))))
#> # A tibble: 4 x 5
#> acond1 acond2 acond3 acond4 diabetes
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 E112 I255 I258 I500 1
#> 2 I250 B341 B348 E669 0
#> 3 A419 F179 I10 I694 0
#> 4 E149 F101 I10 R092 1
由代表 package (v2.0.1) 於 2022 年 1 月 23 日創建
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.