簡體   English   中英

使用嵌套的 ifelse 語句改變 function 創建兩列而不是一列

[英]mutate function with nested ifelse statements creating two columns instead of one

我有一些關於國家/地區的 covid-19 病例的累積數據,我正在嘗試計算一個名為 Diff 的新列中的差異。 我無法刪除 NA 值,因為它不會顯示沒有進行測試的日期。 所以我已經做到了,如果有一個 NA 值,將 Diff 值設置為 0 以表示沒有差異,因此當天沒有進行任何測試。

我還試圖發表聲明,如果 Diff 也是 NA,表明前一天沒有進行任何測試,那么將差異設置為當天的確診病例值。

正如您從底部的結果中看到的那樣,我快到了,但我正在創建一個名為 ifelse 的新列。 我試圖解決這個問題,但我認為我在某處犯了一個簡單的錯誤。 如果有人可以向我指出,我將不勝感激,謝謝。

編輯:我意識到當延遲計算 = NA 時,我在考慮將每日病例設置為確認病例時犯了一個邏輯錯誤,因為這給出了一個誤導性的答案。

當 NA 出現時,我在大型數據集上使用以下代碼填充並重復先前的值。 我按組過濾,以免簡單地在國家/地區傳播前向值。

然后我計算了延遲,然后使用 Ronak Shah 的代碼來獲取每日值。

data <- data %>%
            group_by(CountryName) %>%
            fill(ConfirmedCases, .direction = "down")

data <- data %>%
            mutate(lag1 = ConfirmedCases - lag(ConfirmedCases))

data <- data %>% mutate(DailyCases = replace_na(coalesce(lag1, ConfirmedCases), 0))


library(tidyverse)

data <- data.frame(
          stringsAsFactors = FALSE,
                        CountryName = c("Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan","Afghanistan",
                                        "Afghanistan","Afghanistan"),
                     ConfirmedCases = c(NA,7L,NA,NA,NA,10L,16L,21L,
                                        22L,22L,22L,24L,24L,34L,40L,42L,
                                        75L,75L,91L,106L,114L,141L,166L,
                                        192L,235L,235L,270L,299L,337L,367L,
                                        423L),
                               Diff = c(NA,NA,NA,NA,NA,NA,6L,5L,1L,
                                        0L,0L,2L,0L,10L,6L,2L,33L,0L,16L,
                                        15L,8L,27L,25L,26L,43L,0L,35L,
                                        29L,38L,30L,56L)
                 )

data2 <- data %>%
  mutate(Diff = ifelse(is.na(ConfirmedCases) == TRUE, 0, ConfirmedCases - lag(ConfirmedCases)),
                       ifelse(is.na((ConfirmedCases - lag(ConfirmedCases))) == TRUE, ConfirmedCases, ConfirmedCases - lag(ConfirmedCases)))

head(data2, 10)
#>    CountryName ConfirmedCases Diff ifelse(...)
#> 1  Afghanistan             NA    0          NA
#> 2  Afghanistan              7   NA           7
#> 3  Afghanistan             NA    0          NA
#> 4  Afghanistan             NA    0          NA
#> 5  Afghanistan             NA    0          NA
#> 6  Afghanistan             10   NA          10
#> 7  Afghanistan             16    6           6
#> 8  Afghanistan             21    5           5
#> 9  Afghanistan             22    1           1
#> 10 Afghanistan             22    0           0

代表 package (v0.3.0) 於 2020 年 8 月 15 日創建

也許這可以通過創建目標列的副本來提供幫助:

library(tidyverse)

data %>% mutate(D=ConfirmedCases,D=ifelse(is.na(D),0,D),
                Diff2 = c(0,diff(D)),Diff2=ifelse(Diff2<0,0,Diff2)) %>% select(-D)

Output:

   CountryName ConfirmedCases Diff Diff2
1  Afghanistan             NA   NA     0
2  Afghanistan              7   NA     7
3  Afghanistan             NA   NA     0
4  Afghanistan             NA   NA     0
5  Afghanistan             NA   NA     0
6  Afghanistan             10   NA    10
7  Afghanistan             16    6     6
8  Afghanistan             21    5     5
9  Afghanistan             22    1     1
10 Afghanistan             22    0     0
11 Afghanistan             22    0     0
12 Afghanistan             24    2     2
13 Afghanistan             24    0     0
14 Afghanistan             34   10    10
15 Afghanistan             40    6     6
16 Afghanistan             42    2     2
17 Afghanistan             75   33    33
18 Afghanistan             75    0     0
19 Afghanistan             91   16    16
20 Afghanistan            106   15    15
21 Afghanistan            114    8     8
22 Afghanistan            141   27    27
23 Afghanistan            166   25    25
24 Afghanistan            192   26    26
25 Afghanistan            235   43    43
26 Afghanistan            235    0     0
27 Afghanistan            270   35    35
28 Afghanistan            299   29    29
29 Afghanistan            337   38    38
30 Afghanistan            367   30    30
31 Afghanistan            423   56    56

我認為您可以使用coalesceDiffConfirmedCases中獲取第一個非 NA 值,如果它們都是NA將其替換為 0。

library(dplyr)
data %>%
  mutate(Diff2 = tidyr::replace_na(coalesce(Diff,  ConfirmedCases), 0))

#   CountryName ConfirmedCases Diff Diff2
#1  Afghanistan             NA   NA     0
#2  Afghanistan              7   NA     7
#3  Afghanistan             NA   NA     0
#4  Afghanistan             NA   NA     0
#5  Afghanistan             NA   NA     0
#6  Afghanistan             10   NA    10
#7  Afghanistan             16    6     6
#8  Afghanistan             21    5     5
#9  Afghanistan             22    1     1
#10 Afghanistan             22    0     0
#11 Afghanistan             22    0     0
#12 Afghanistan             24    2     2
#...
#...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM