如何根據列名的部分字符串匹配從列中選擇特定觀察

Question

我的數據集有大量以“dis ....”開頭的列。

列中的值為 0（無疾病）或 1（有疾病）。 我想創建一個觀察數據框，其中 1 表示特定疾病，0 表示其他所有疾病。

我嘗試了以下方法：

istroke <- filter(onlyCRP, dis_ep0009 == 1 & grep("dis_" == 0))

並結合選擇：

istroke1 <- filter(onlyCRP, dis_ep0009 == 1 & select(contains("dis_") == 0))

正如您所猜測的，它們都不起作用。

我看過這些帖子：

通過數據框中的正則表達式過濾列

基於列名部分匹配的子集數據

但他們沒有回答我的問題。

如果您需要進一步說明，請告訴我。

編輯我意識到我需要進一步澄清我想要什么。 考慮這個表：

dis_ep0009  dis_epxxx   dis_epxxx
 0            0             0
 0            1             0  
 0            0             1
 1            0             1
 0            0             0
 0            0             0
 1            1             1

我需要另一列，例如 - 根據這 3 列的某些條件（我實際上有 29 個這些“dis_”列）：

如果 dis_ep0009 == 1，則 IS == 1（無論在任何其他“dis..”列上是 0 還是 1）。
如果 dis_ep0009 == 0 和 dis_epxxx == 1，我想放棄這些觀察
如果 dis_ep0009 == 0 和 dis_epxxx == 0，我想編碼 IS == 0。

所以結果表應該是這樣的：

dis_ep0009  dis_epxxx   dis_epxxx    IS
 0            0             0        0
 0            1             0        drop
 0            0             1        drop
 1            0             1        1
 0            0             0        0
 0            0             0        0
 1            1             1        1

我曾嘗試將過濾器 (dplyr) 與 grep 和 ifelse 語句配對，但無法對其進行正面或反面處理。 本質上，它應該是這樣的簡單（不打算工作）：

istroke <- filter(df, ifelse(dis_ep0009 == 1, 1, ifelse(dis_ep0009 == 0 & grep("dis_", names(df)) == 0, 0, ifelse(dis_ep0009 == 0 & grep("dis_", names(df)) == 1, drop())))

提前致謝！

Answer 1

查看代碼中的注釋，並告訴我這是否是您想要的

specific_disease <- "dis_ep0009"
disease_cols <- grep("dis",names(onlyCRP),value=TRUE) # all columns containing "dis"
disease_cols <- setdiff(disease_cols,specific_disease) # all these columns except your specific disease
onlyCRP$any_other_disease <- apply(onlyCRP[,disease_cols]==1,1,any) # a Boolean column saying if there is another disease besides the possible specific one
onlyCRP[onlyCRP$specific_disease == 1 & !onlyCRP$any_other_disease,] # the subset where you'll have only your specific disease and no other

如何根據列名的部分字符串匹配從列中選擇特定觀察

問題描述

1 個解決方案

解決方案1
0 2017-06-06 15:46:24

如何根據列名的部分字符串匹配從列中選擇特定觀察

問題描述

1 個解決方案

解決方案1 0 2017-06-06 15:46:24

解決方案1
0 2017-06-06 15:46:24