简体   繁体   English

当我在 R 代码中有 NA 数据时生成变量的问题

[英]Problems generating variable when I have NA data in R code

I have a question regarding the code below.我对下面的代码有疑问。 Note that my input data was day = 30/06 , Category = FDE and DTT = Hol , and I can get SPV ( First Code ).请注意,我的输入数据是day = 30/06Category = FDEDTT = Hol ,我可以得到SPVFirst Code )。 However, when I do day = 30/06 , Category = ABC and DTT = NA , I can't get SPV ( Second Code ).但是,当我做day = 30/06Category = ABCDTT = NA时,我无法获得SPV第二代码)。 I would need to show the line corresponding to that date/category/dtt.我需要显示对应于该日期/类别/dtt 的行。 How to adjust this?如何调整这个?

Executable code below:可执行代码如下:

For 30/06, FDE, Hol对于 30/06,FDE,霍尔

library(dplyr)

df1 <- structure(
  list(date1= c("2021-06-28","2021-06-28","2021-06-28"),
       date2 = c("2021-06-30","2021-06-30","2021-07-02"),
       DTT= c(NA,"Hol","Hol"),
       Week= c("Wednesday","Wednesday","Friday"),
       Category = c("ABC","FDE","ABC"),
       DR1 = c(4,1,1),
       DR01 = c(4,1,2), DR02= c(4,2,0),DR03= c(9,5,0),
       DR04 = c(5,4,3),DR05 = c(5,4,0)),
  class = "data.frame", row.names = c(NA, -3L))


dmda<-"2021-06-30"
CategoryChosse<-"FDE"
DTest<-"Hol"

  x<-df1 %>% select(starts_with("DR0"))
  
  x<-cbind(df1, setNames(df1$DR1 - x, paste0(names(x), "_PV")))
  PV<-select(x, date2,Week, Category, DTT, DR1, ends_with("PV"))
  
  med<-PV %>%
    group_by(Category,Week,DTT) %>%
    summarize(across(ends_with("PV"), median))
  
  SPV<-df1%>%
    inner_join(med, by = c('Category', 'Week','DTT')) %>%
    mutate(across(matches("^DR0\\d+$"), ~.x + 
                    get(paste0(cur_column(), '_PV')),
                  .names = '{col}_{col}_PV')) %>%
    select(date1:Category, DR01_DR01_PV:last_col())
  
  SPV<-data.frame(SPV)
  
  mat1 <- df1 %>%
    filter(date2 == dmda, Category == CategoryChosse, DTT==DTest) %>%
    select(starts_with("DR0")) %>%
    pivot_longer(cols = everything()) %>%
    arrange(desc(row_number())) %>%
    mutate(cs = cumsum(value)) %>%
    filter(cs == 0) %>%
    pull(name)
  
  (dropnames <- paste0(mat1,"_",mat1, "_PV"))
  
  SPV <- SPV %>%
    filter(date2 == dmda, Category == CategoryChosse, DTT==DTest) %>%
    select(-any_of(dropnames))
  if(length(grep("DR0", names(SPV))) == 0) {
    SPV[mat1] <- NA_real_
  }

> SPV
       date1      date2 DTT      Week Category DR01_DR01_PV DR02_DR02_PV DR03_DR03_PV DR04_DR04_PV DR05_DR05_PV
1 2021-06-28 2021-06-30 Hol Wednesday      FDE            1            1            1            1    

For 30/06, ABC, NA对于 30/06,ABC,NA

dmda<-"2021-06-30"
CategoryChosse<-"ABC"
DTest<-NA

x<-df1 %>% select(starts_with("DR0"))

x<-cbind(df1, setNames(df1$DR1 - x, paste0(names(x), "_PV")))
PV<-select(x, date2,Week, Category, DTT, DR1, ends_with("PV"))

med<-PV %>%
    group_by(Category,Week,DTT) %>%
    summarize(across(ends_with("PV"), median))

SPV<-df1%>%
    inner_join(med, by = c('Category', 'Week','DTT')) %>%
    mutate(across(matches("^DR0\\d+$"), ~.x + 
                      get(paste0(cur_column(), '_PV')),
                  .names = '{col}_{col}_PV')) %>%
    select(date1:Category, DR01_DR01_PV:last_col())

SPV<-data.frame(SPV)

mat1 <- df1 %>%
    filter(date2 == dmda, Category == CategoryChosse, DTT==DTest) %>%
    select(starts_with("DR0")) %>%
    pivot_longer(cols = everything()) %>%
    arrange(desc(row_number())) %>%
    mutate(cs = cumsum(value)) %>%
    filter(cs == 0) %>%
    pull(name)

(dropnames <- paste0(mat1,"_",mat1, "_PV"))

SPV <- SPV %>%
    filter(date2 == dmda, Category == CategoryChosse, DTT==DTest) %>%
    select(-any_of(dropnames))
if(length(grep("DR0", names(SPV))) == 0) {
    SPV[mat1] <- NA_real_
}

> SPV
[1] date1        date2        DTT          Week         Category     DR01_DR01_PV DR02_DR02_PV DR03_DR03_PV
[9] DR04_DR04_PV DR05_DR05_PV
<0 lines>

You can write your own function which is like == unless one element is NA , and in that case only returns TRUE when both are NA .您可以编写自己的 function 就像==除非一个元素是NA ,并且在这种情况下,只有当两者都是NA时才返回TRUE Then use that function in filter instead of == .然后在filter中使用 function 而不是==

In the future please attempt to create a minimal reproducible example like the one below, so that you are asking a specific question rather than asking people to fix your code for you.将来请尝试创建一个最小的可重现示例,如下所示,这样您就可以提出一个特定的问题,而不是要求人们为您修复您的代码。 Almost none of the code you have posted has to do with the question you are asking.您发布的几乎所有代码都与您提出的问题无关。

See How to make a great R reproducible example请参阅如何制作出色的 R 可重现示例

library(dplyr, warn.conflicts = FALSE)

same <- function(x, y){
  case_when(
    is.na(x) != is.na(y) ~ FALSE,
    is.na(x) ~ TRUE,
    TRUE ~ x == y)
}

df <- data.frame(x = c('hol', NA))

x_want <- 'hol'

df %>% 
  filter(same(x, x_want))
#>     x
#> 1 hol

x_want <- NA

df %>% 
  filter(same(x, x_want))
#>      x
#> 1 <NA>

Created on 2021-12-20 by the reprex package (v2.0.1)代表 package (v2.0.1) 于 2021 年 12 月 20 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM