簡體   English   中英

如何在多列中使用 ifelse 和 str_detect

[英]How to use ifelse with str_detect across multiple columns

我有一個 dataframe 顯示死者(死者)的 ICD-10 代碼。 數據框中的每一行對應一個死者,每個死者都可以有多達 20 個條件作為導致他或她死亡的因素列出。 我想創建一個新列,顯示死者是否有任何 ICD-10 糖尿病代碼(1 表示是,0 表示否)。 糖尿病代碼在 E10-E14 范圍內,即糖尿病代碼必須以以下向量中的任何字符串開頭,但第四個 position 可以采用不同的值:

diabetes <- c("E10","E11","E12","E13","E14")

這是一個小的、虛構的數據示例:

original <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255", 
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))
acond1 第二個 acond3 acond4
E112 I255 I258 I500
I250 B341 B348 E669
A419 F179 I10 I694
E149 F101 I10 R092

這是我想要的結果:

acond1 第二個 acond3 acond4 糖尿病
E112 I255 I258 I500 1
I250 B341 B348 E669 0
A419 F179 I10 I694 0
E149 F101 I10 R092 1

還有一些其他帖子(例如, 在 dataframe 上使用 if else 跨多個列Str_detect 多個列使用 cross )關於此類問題,但我似乎無法將它們放在一起。 到目前為止,這是我沒有成功的嘗試:

library(tidyverse)
library(stringr)

#attempt 1
original %>%
  mutate_at(vars(contains("acond")), ifelse(str_detect(.,paste0("^(", 
  paste(diabetes, collapse = "|"), ")")), 1, 0))

#attempt 2
original %>%
  unite(col = "all_conditions", starts_with("acond"), sep = ", ", remove = FALSE) %>%
  mutate(diabetes = if_else(str_detect(.,paste0("^(", paste(diabetes, collapse = "|"), ")")), 1, 0))

任何幫助,將不勝感激。

這是使用apply基本 R方法

dia <- paste(c("E10","E11","E12","E13","E14"), collapse="|")

df$diabetes <- apply(df, 1, function(x) any(grepl(dia,x)))*1

df
  acond1 acond2 acond3 acond4 diabetes
1   E112   I255   I258   I500        1
2   I250   B341   B348   E669        0
3   A419   F179    I10   I694        0
4   E149   F101    I10   R092        1

dplyr

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(diabetes=any(grepl(dia,c_across(starts_with("ac"))))*1) %>% 
  ungroup
# A tibble: 4 × 5
  acond1 acond2 acond3 acond4 diabetes
  <chr>  <chr>  <chr>  <chr>     <dbl>
1 E112   I255   I258   I500          1
2 I250   B341   B348   E669          0
3 A419   F179   I10    I694          0
4 E149   F101   I10    R092          1

數據

df <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255", 
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), class = "data.frame", row.names = c(NA, 
-4L))

如果我們想across wit ifelsestr_detect使用,那么我們可以:

  1. str_detect創建一個帶有pastecollapse的模式
  2. across所有列進行mutate ,並使用帶有條件的匿名~ifelse.names來控制新列
  3. unite新列
  4. 來自閱讀器parse_numberreadr技巧
diabetes <- c("E10","E11","E12","E13","E14")

pattern <- paste(diabetes, collapse = "|")

library(tidyverse)

original %>% 
  mutate(across(everything(), ~ifelse(str_detect(., pattern), 1, 0), .names = "new_{col}")) %>% 
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(diabetes = parse_number(New_Col), .keep="unused")                                                                                                                                                                                                                                                                                              
  acond1 acond2 acond3 acond4 diabetes
  <chr>  <chr>  <chr>  <chr>     <dbl>
1 E112   I255   I258   I500          1
2 I250   B341   B348   E669          0
3 A419   F179   I10    I694          0
4 E149   F101   I10    R092          1
library(tidyverse)

diabetes_pattern <- c("E10","E11","E12","E13","E14") %>% 
  str_c(collapse = "|")

original <-
  structure(
    list(
      acond1 = c("E112", "I250", "A419", "E149"),
      acond2 = c("I255", "B341", "F179", "F101"),
      acond3 = c("I258", "B348", "I10", "I10"),
      acond4 = c("I500", "E669", "I694", "R092")
    ),
    row.names = c(NA,-4L),
    class = c("tbl_df", "tbl", "data.frame")
  )

original %>% 
  rowwise() %>% 
  mutate(diabetes = +any(str_detect(string = c_across(everything()), pattern = diabetes_pattern)))
#> # A tibble: 4 x 5
#> # Rowwise: 
#>   acond1 acond2 acond3 acond4 diabetes
#>   <chr>  <chr>  <chr>  <chr>     <int>
#> 1 E112   I255   I258   I500          1
#> 2 I250   B341   B348   E669          0
#> 3 A419   F179   I10    I694          0
#> 4 E149   F101   I10    R092          1

original %>% 
  mutate(diabetes = rowSums(across(.cols = everything(), ~str_detect(.x, diabetes_pattern))))
#> # A tibble: 4 x 5
#>   acond1 acond2 acond3 acond4 diabetes
#>   <chr>  <chr>  <chr>  <chr>     <dbl>
#> 1 E112   I255   I258   I500          1
#> 2 I250   B341   B348   E669          0
#> 3 A419   F179   I10    I694          0
#> 4 E149   F101   I10    R092          1

代表 package (v2.0.1) 於 2022 年 1 月 23 日創建

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM