簡體   English   中英

如果R中的中位數為零,則腳本不會計算中位數

[英]The script does not calculate the median if it is zero in R

我有劇本

library(dplyr)
newest=mydat %>% filter(SaleCount > 0) %>%  #First filter out for SaleCount > 0 which of our interest
  group_by(CustomerName,ItemRelation, DocumentNum, DocumentYear,CustomerType) %>%
  mutate(k = mean(SaleCount[IsPromo==1]),
         m0 = median(tail(SaleCount[IsPromo==0], 5))) %>%  # Calculate m and m0 for all rows
  filter(IsPromo == 1) %>%  # Now keep only rows with IsPromo == 1

   mutate(r = (k-m0)*n())  %>% distinct()

這個腳本

1. calculate mean value for salecount for 1 category of Ispromo
(without negative value and zero values)
2. for zero category of ispromo , it calculates medians for 5 last obs. by salescount
(without negative value and zero values)
3. than it subtracts median from mean and multiply result on  the count of non-zero and non-negative values for 1 category of ispromo

但有時中位數可以等於= 0,如本例所示

mydat=structure(list(ItemRelation = c(11712L, 11712L, 11712L, 11712L, 
11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 
11712L, 11712L, 11712L), SaleCount = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 18L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DocumentNum = c(197L, 197L, 
197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 
197L, 197L), DocumentYear = c(2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), CustomerType = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), CustomerName = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("ItemRelation", 
"SaleCount", "DocumentNum", "DocumentYear", "IsPromo", "CustomerType", 
"CustomerName"), class = "data.frame", row.names = c(NA, -15L
))

在這種情況下,代碼將寫入NA ,然后它就不會從均值中追蹤中值並且不會相乘。

簡單的例子

ItemRelation    SaleCount   DocumentNum k   m0  r
11712             18    197           18    NA  NA

如何將零中位數考慮在內並正確工作?

針對AAron答案進行了編輯

對於一個ispromo類別,salescount的平均值必須乘以非零和非負值的計數。 怎么做?

您的邏輯存在問題,而不是代碼存在問題。 您首先說您希望最后五個值的中位數不包含負值和零值,但是然后說中位數應為零。 但是由於第一個原因,您已經刪除了過濾器中的所有零值,因此所有值均為零(當IsPromo = 0時),並且沒有數據可作為其中值。

如果將所有NA都設置為0並包含另一個mutatemutate_all(funs(ifelse(is.na(.), 0, .))) )怎么辦?

newest=mydat %>% filter(SaleCount > 0) %>%  #First filter out for SaleCount > 0 which of our interest
  group_by(ItemRelation, DocumentNum, DocumentYear) %>%
  mutate(k = mean(SaleCount[IsPromo==1]),
         m0 = median(tail(SaleCount[IsPromo==0], 5))) %>%  # Calculate m and m0 for all rows
  mutate_all(funs(ifelse(is.na(.), 0, .))) %>% 
  filter(IsPromo == 1) %>%  # Now keep only rows with IsPromo == 1
  mutate(r = (k-m0)*n())  %>% distinct()

這將導致以下結果:

  ItemRelation SaleCount DocumentNum DocumentYear IsPromo CustomerType CustomerName k m0 r 1 11712 18 197 2017 1 1 2 18 0 18 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM