如果R中的中位數為零，則腳本不會計算中位數

Question

我有劇本

library(dplyr)
newest=mydat %>% filter(SaleCount > 0) %>%  #First filter out for SaleCount > 0 which of our interest
  group_by(CustomerName,ItemRelation, DocumentNum, DocumentYear,CustomerType) %>%
  mutate(k = mean(SaleCount[IsPromo==1]),
         m0 = median(tail(SaleCount[IsPromo==0], 5))) %>%  # Calculate m and m0 for all rows
  filter(IsPromo == 1) %>%  # Now keep only rows with IsPromo == 1

   mutate(r = (k-m0)*n())  %>% distinct()

這個腳本

1. calculate mean value for salecount for 1 category of Ispromo
(without negative value and zero values)
2. for zero category of ispromo , it calculates medians for 5 last obs. by salescount
(without negative value and zero values)
3. than it subtracts median from mean and multiply result on  the count of non-zero and non-negative values for 1 category of ispromo

但有時中位數可以等於= 0，如本例所示

mydat=structure(list(ItemRelation = c(11712L, 11712L, 11712L, 11712L, 
11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 
11712L, 11712L, 11712L), SaleCount = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 18L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DocumentNum = c(197L, 197L, 
197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 
197L, 197L), DocumentYear = c(2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), CustomerType = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), CustomerName = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("ItemRelation", 
"SaleCount", "DocumentNum", "DocumentYear", "IsPromo", "CustomerType", 
"CustomerName"), class = "data.frame", row.names = c(NA, -15L
))

在這種情況下，代碼將寫入NA ，然后它就不會從均值中追蹤中值並且不會相乘。

簡單的例子

ItemRelation    SaleCount   DocumentNum k   m0  r
11712             18    197           18    NA  NA

如何將零中位數考慮在內並正確工作？

針對AAron答案進行了編輯

對於一個ispromo類別，salescount的平均值必須乘以非零和非負值的計數。 怎么做？

Answer 1

您的邏輯存在問題，而不是代碼存在問題。 您首先說您希望最后五個值的中位數不包含負值和零值，但是然后說中位數應為零。 但是由於第一個原因，您已經刪除了過濾器中的所有零值，因此所有值均為零（當IsPromo = 0時），並且沒有數據可作為其中值。

Answer 2

如果將所有NA都設置為0並包含另一個mutate （ mutate_all(funs(ifelse(is.na(.), 0, .))) ）怎么辦？

newest=mydat %>% filter(SaleCount > 0) %>%  #First filter out for SaleCount > 0 which of our interest
  group_by(ItemRelation, DocumentNum, DocumentYear) %>%
  mutate(k = mean(SaleCount[IsPromo==1]),
         m0 = median(tail(SaleCount[IsPromo==0], 5))) %>%  # Calculate m and m0 for all rows
  mutate_all(funs(ifelse(is.na(.), 0, .))) %>% 
  filter(IsPromo == 1) %>%  # Now keep only rows with IsPromo == 1
  mutate(r = (k-m0)*n())  %>% distinct()

這將導致以下結果：

  ItemRelation SaleCount DocumentNum DocumentYear IsPromo CustomerType CustomerName k m0 r 1 11712 18 197 2017 1 1 2 18 0 18

如果R中的中位數為零，則腳本不會計算中位數

問題描述

針對AAron答案進行了編輯

2 個解決方案

解決方案1
4 2018-07-11 15:12:27

解決方案2
1 2018-07-11 15:17:26

如果R中的中位數為零，則腳本不會計算中位數

問題描述

針對AAron答案進行了編輯

2 個解決方案

解決方案1 4 2018-07-11 15:12:27

解決方案2 1 2018-07-11 15:17:26

解決方案1
4 2018-07-11 15:12:27

解決方案2
1 2018-07-11 15:17:26