[英]filter(!is.na(column)) is not removing NA's from column in R
我正在重做一項學習作業,看看我是否可以改進它並重新投入其中。 任務是編寫一個 function,給定 2 個變量“狀態”和“結果”返回 state 中對於給定結果/疾病具有最低死亡率的醫院名稱。 由於某種原因,我的 filter(.is.na()) 行似乎不起作用,我覺得這與我使用粘貼到 select 列名這一事實有關。 但在我的測試中,這似乎並不重要。
這是代碼:
library(dplyr)
data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
dataSelected <- data %>%
select("Hospital.Name", "State", "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack", "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure", "Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia")
colnames(dataSelected) <- c("HostpitalName","State","DeathRateHeartAttack","DeathRateHeartFailure","DeathRatePneumonia")
dataSelected[,3] <- as.numeric(dataSelected[,3])
dataSelected[,4] <- as.numeric(dataSelected[,4])
dataSelected[,5] <- as.numeric(dataSelected[,5])
best <- function(state,outcome){
column <- paste('DeathRate',outcome, sep = "")
if (state %in% dataSelected$State < 1){
return('Invalid state')
} else if (column %in% colnames(dataSelected) < 1){
return('Invalid outcome')
} else{
BestHospitals <- dataSelected %>%
select(HostpitalName,State,column) %>%
filter(!is.na(column)) %>%
filter(State == state) %>%
arrange(column,HostpitalName)
return(BestHospitals[1,1])
}
}
我的 function 通話
best("AL","HeartAttack")
版本信息
平台 x86_64-apple-darwin15.6.0
拱 x86_64
操作系統 darwin15.6.0
系統 x86_64,darwin15.6.0
地位
專業 3
未成年人 6.1
2019 年
07月
第 5 天
svn rev 76782
語言 R
version.string R 版本 3.6.1 (2019-07-05) 昵稱 腳趾動作
output 的 dput(head(dataSelected)):
structure(list(HostpitalName = c("SOUTHEAST ALABAMA MEDICAL CENTER",
"MARSHALL MEDICAL CENTER SOUTH", "ELIZA COFFEE MEMORIAL HOSPITAL",
"MIZELL MEMORIAL HOSPITAL", "CRENSHAW COMMUNITY HOSPITAL", "MARSHALL MEDICAL CENTER NORTH"
), State = c("AL", "AL", "AL", "AL", "AL", "AL"), DeathRateHeartAttack = c(14.3,
18.5, 18.1, NA, NA, NA), DeathRateHeartFailure = c(11.4, 15.2,
11.3, 13.6, 13.8, 12.5), DeathRatePneumonia = c(10.9, 13.9, 13.4,
14.9, 15.8, 8.7)), row.names = c(NA, 6L), class = "data.frame")
按列號而不是名稱過濾怎么樣?
best <- function(state,outcome){
column <- paste('DeathRate',outcome, sep = "")
if (state %in% dataSelected$State < 1){
return('Invalid state')
} else if (column %in% colnames(dataSelected) < 1){
return('Invalid outcome')
} else{
BestHospitals <- dataSelected %>%
select(HostpitalName,State,column) %>%
filter(!is.na(.[,3])) %>%
filter(State == state) %>%
arrange(desc(.[3])))
return(BestHospitals[1,1])
}
}
我冒昧地重寫了你的 function。
# it is usually a bad idea to insert a global variable (like your data frame) inside a function.
best <- function(dat=NULL,state=NULL,outcome=NULL){
column <- paste('DeathRate',outcome, sep = "")
if (!state %in% dat$State | !column %in% colnames(dat)){
stop('Invalid input')
} # negating and using or "|" makes it easier to read
else{
BestHospitals <- dat %>%
select(HostpitalName,State,column) %>%
na.omit() %>% # for your purpose the much more concise na.omit() is a better option
filter(State == state) %>%
arrange(column,HostpitalName) %>%
filter(row_number()==1) #dplyr way to choose the first row
return(BestHospitals)
}
}
best(dat = dataSelected, state = "AL", outcome = "HeartAttack")
HostpitalName State DeathRateHeartAttack
1 ELIZA COFFEE MEMORIAL HOSPITAL AL 18.1
這是 function 的類似 tidyverse 的版本。
正如 DJ 評論的那樣,在 function 中包含對全局 object ( dataSelected
) 的引用並不是一個好習慣。 將其作為參數傳遞要好得多。 這還具有有益的副作用,允許您在 pipe 中使用 function。
此外, HostpitalName
可能是一個錯字。 您是說HospitalName
嗎? 由於您不止一次使用了奇怪的拼寫,所以我保留了它。
雖然我理解您為什么允許用戶通過將"DeathRate"
的公共前綴粘貼到outcome
中傳遞的值來縮短所需結果的名稱,但這可能不是最佳實踐,因為要求用戶了解此約定,並將 function 的使用限制為遵循此約定的數據幀和列。 它也不適合 tidyverse 語法。
best <- function(d, state, outcome){
# By adding d as the first parameter, you make it easy to use the function in
# a pipe. And on data frames other than dataSelected.
qOutcome <- enquo(outcome)
# calling stop is better than returning an error message as
# it stops processing immediately.
if (!(state %in% (d %>% distinct(State) %>% pull(State) ))) {
stop('Invalid state')
}
if (!(as_label(qOutcome) %in% colnames(d)) ){
stop('Invalid outcome')
}
# Converting original code to equivalent tidyverse idioms.
# HostpitalName should perhaps be HospitalName in the source data frame
d %>%
select(HostpitalName, State, !! qOutcome) %>%
filter(!is.na(!! qOutcome)) %>%
filter(State == state) %>%
arrange(!! qOutcome, HostpitalName) %>%
pull(HostpitalName) %>%
head(1)
}
所以我們可以寫
dataSelected %>% best("AL", DeathRateHeartFailure)
[1] "ELIZA COFFEE MEMORIAL HOSPITAL"
或者說,
dataSelected %>% best("AL", DeathRatePneumonia)
[1] "MARSHALL MEDICAL CENTER NORTH"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.