簡體   English   中英

filter(.is.na(column)) 沒有從 R 的列中刪除 NA

[英]filter(!is.na(column)) is not removing NA's from column in R

我正在重做一項學習作業,看看我是否可以改進它並重新投入其中。 任務是編寫一個 function,給定 2 個變量“狀態”和“結果”返回 state 中對於給定結果/疾病具有最低死亡率的醫院名稱。 由於某種原因,我的 filter(.is.na()) 行似乎不起作用,我覺得這與我使用粘貼到 select 列名這一事實有關。 但在我的測試中,這似乎並不重要。

這是代碼:

library(dplyr)
data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")

dataSelected <- data %>%
        select("Hospital.Name", "State", "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack", "Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure", "Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia")

colnames(dataSelected) <- c("HostpitalName","State","DeathRateHeartAttack","DeathRateHeartFailure","DeathRatePneumonia")

dataSelected[,3] <- as.numeric(dataSelected[,3])
dataSelected[,4] <- as.numeric(dataSelected[,4])
dataSelected[,5] <- as.numeric(dataSelected[,5])


best <- function(state,outcome){
        column <- paste('DeathRate',outcome, sep = "")
        if (state %in% dataSelected$State < 1){
                return('Invalid state')
        } else if (column %in% colnames(dataSelected) < 1){
                return('Invalid outcome')
        } else{
        BestHospitals <- dataSelected %>%
                select(HostpitalName,State,column) %>%
                filter(!is.na(column)) %>%
                filter(State == state) %>%
                arrange(column,HostpitalName)
        return(BestHospitals[1,1])
        }
}

我的 function 通話

best("AL","HeartAttack")

版本信息

平台 x86_64-apple-darwin15.6.0
拱 x86_64
操作系統 darwin15.6.0
系統 x86_64,darwin15.6.0
地位
專業 3
未成年人 6.1
2019 年
07月
第 5 天
svn rev 76782
語言 R
version.string R 版本 3.6.1 (2019-07-05) 昵稱 腳趾動作

output 的 dput(head(dataSelected)):

structure(list(HostpitalName = c("SOUTHEAST ALABAMA MEDICAL CENTER", 
"MARSHALL MEDICAL CENTER SOUTH", "ELIZA COFFEE MEMORIAL HOSPITAL", 
"MIZELL MEMORIAL HOSPITAL", "CRENSHAW COMMUNITY HOSPITAL", "MARSHALL MEDICAL CENTER NORTH"
), State = c("AL", "AL", "AL", "AL", "AL", "AL"), DeathRateHeartAttack = c(14.3, 
18.5, 18.1, NA, NA, NA), DeathRateHeartFailure = c(11.4, 15.2, 
11.3, 13.6, 13.8, 12.5), DeathRatePneumonia = c(10.9, 13.9, 13.4, 
14.9, 15.8, 8.7)), row.names = c(NA, 6L), class = "data.frame")

按列號而不是名稱過濾怎么樣?

best <- function(state,outcome){
        column <- paste('DeathRate',outcome, sep = "")
        if (state %in% dataSelected$State < 1){
                return('Invalid state')
        } else if (column %in% colnames(dataSelected) < 1){
                return('Invalid outcome')
        } else{
        BestHospitals <- dataSelected %>%
                select(HostpitalName,State,column) %>%
                filter(!is.na(.[,3])) %>%
                filter(State == state) %>%
                arrange(desc(.[3])))
        return(BestHospitals[1,1])
        }
}

我冒昧地重寫了你的 function。

# it is usually a bad idea to insert a global variable (like your data frame) inside a function. 
best <- function(dat=NULL,state=NULL,outcome=NULL){ 
  
  column <- paste('DeathRate',outcome, sep = "")
  
  if (!state %in% dat$State | !column %in% colnames(dat)){
    stop('Invalid input')
  } # negating and using or "|" makes it easier to read
  else{
    BestHospitals <- dat %>%
      select(HostpitalName,State,column) %>%
      na.omit() %>% # for your purpose the much more concise na.omit() is a better option
      filter(State == state) %>%
      arrange(column,HostpitalName) %>% 
      filter(row_number()==1) #dplyr way to choose the first row
    
    return(BestHospitals)
  }
}

best(dat = dataSelected, state = "AL", outcome = "HeartAttack")

                   HostpitalName State DeathRateHeartAttack
1 ELIZA COFFEE MEMORIAL HOSPITAL    AL                 18.1

這是 function 的類似 tidyverse 的版本。

正如 DJ 評論的那樣,在 function 中包含對全局 object ( dataSelected ) 的引用並不是一個好習慣。 將其作為參數傳遞要好得多。 這還具有有益的副作用,允許您在 pipe 中使用 function。

此外, HostpitalName可能是一個錯字。 您是說HospitalName嗎? 由於您不止一次使用了奇怪的拼寫,所以我保留了它。

雖然我理解您為什么允許用戶通過將"DeathRate"的公共前綴粘貼到outcome中傳遞的值來縮短所需結果的名稱,但這可能不是最佳實踐,因為要求用戶了解此約定,並將 function 的使用限制為遵循此約定的數據幀和列。 它也不適合 tidyverse 語法。

best <- function(d, state, outcome){
  # By adding d as the first parameter, you make it easy to use the function in
  # a pipe.  And on data frames other than dataSelected.
  qOutcome <- enquo(outcome)
  # calling stop is better than returning an error message as 
  # it stops processing immediately.
  if (!(state %in% (d %>% distinct(State) %>% pull(State) ))) {
    stop('Invalid state')
  }
  if (!(as_label(qOutcome) %in% colnames(d)) ){
    stop('Invalid outcome')
  }
  # Converting original code to equivalent tidyverse idioms.
  # HostpitalName should perhaps be HospitalName in the source data frame
  d %>%
    select(HostpitalName, State, !! qOutcome) %>%
    filter(!is.na(!! qOutcome)) %>%
    filter(State == state) %>%
    arrange(!! qOutcome, HostpitalName) %>% 
    pull(HostpitalName) %>% 
    head(1)
}

所以我們可以寫

dataSelected %>% best("AL", DeathRateHeartFailure)
[1] "ELIZA COFFEE MEMORIAL HOSPITAL"

或者說,

dataSelected %>% best("AL", DeathRatePneumonia)
[1] "MARSHALL MEDICAL CENTER NORTH"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM