[英]The script does not calculate the median if it is zero in R
我有劇本
library(dplyr)
newest=mydat %>% filter(SaleCount > 0) %>% #First filter out for SaleCount > 0 which of our interest
group_by(CustomerName,ItemRelation, DocumentNum, DocumentYear,CustomerType) %>%
mutate(k = mean(SaleCount[IsPromo==1]),
m0 = median(tail(SaleCount[IsPromo==0], 5))) %>% # Calculate m and m0 for all rows
filter(IsPromo == 1) %>% # Now keep only rows with IsPromo == 1
mutate(r = (k-m0)*n()) %>% distinct()
這個腳本
1. calculate mean value for salecount for 1 category of Ispromo
(without negative value and zero values)
2. for zero category of ispromo , it calculates medians for 5 last obs. by salescount
(without negative value and zero values)
3. than it subtracts median from mean and multiply result on the count of non-zero and non-negative values for 1 category of ispromo
但有時中位數可以等於= 0,如本例所示
mydat=structure(list(ItemRelation = c(11712L, 11712L, 11712L, 11712L,
11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L,
11712L, 11712L, 11712L), SaleCount = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 18L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DocumentNum = c(197L, 197L,
197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L,
197L, 197L), DocumentYear = c(2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), CustomerType = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), CustomerName = c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("ItemRelation",
"SaleCount", "DocumentNum", "DocumentYear", "IsPromo", "CustomerType",
"CustomerName"), class = "data.frame", row.names = c(NA, -15L
))
在這種情況下,代碼將寫入NA
,然后它就不會從均值中追蹤中值並且不會相乘。
簡單的例子
ItemRelation SaleCount DocumentNum k m0 r
11712 18 197 18 NA NA
如何將零中位數考慮在內並正確工作?
對於一個ispromo類別,salescount的平均值必須乘以非零和非負值的計數。 怎么做?
您的邏輯存在問題,而不是代碼存在問題。 您首先說您希望最后五個值的中位數不包含負值和零值,但是然后說中位數應為零。 但是由於第一個原因,您已經刪除了過濾器中的所有零值,因此所有值均為零(當IsPromo = 0時),並且沒有數據可作為其中值。
如果將所有NA都設置為0並包含另一個mutate
( mutate_all(funs(ifelse(is.na(.), 0, .)))
)怎么辦?
newest=mydat %>% filter(SaleCount > 0) %>% #First filter out for SaleCount > 0 which of our interest
group_by(ItemRelation, DocumentNum, DocumentYear) %>%
mutate(k = mean(SaleCount[IsPromo==1]),
m0 = median(tail(SaleCount[IsPromo==0], 5))) %>% # Calculate m and m0 for all rows
mutate_all(funs(ifelse(is.na(.), 0, .))) %>%
filter(IsPromo == 1) %>% # Now keep only rows with IsPromo == 1
mutate(r = (k-m0)*n()) %>% distinct()
這將導致以下結果:
ItemRelation SaleCount DocumentNum DocumentYear IsPromo CustomerType CustomerName k m0 r 1 11712 18 197 2017 1 1 2 18 0 18
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.