I am new to programming and would appreciate any help with this. I have a data frame that contains products names and the day when it was sold. For each product, I'd need to see the fraction that are sold in on Monday, Tuesday, Wednesday, etc.
Please follow this to replicate my dataframe:
Product=c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","C","C","C")
Day=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")
df=data.frame(cbind(Product,Day))
I tried the following:
data.frame(prop.table(with(df,table(Product,Day))))
df.wide=reshape(data.frame(prop.table(with(df,table(Product,Day)))),
timevar="Day",
idvar="Product",
direction="wide")
which gives me
Product Freq.Friday Freq.Monday Freq.Saturday Freq.Sunday Freq.Thursday Freq.Tuesday Freq.Wednesday
A 0.1111111 0.11111111 0.11111111 0.00000000 0.1111111 0.11111111 0.11111111
B 0.0000000 0.05555556 0.00000000 0.00000000 0.0000000 0.05555556 0.05555556
C 0.0000000 0.05555556 0.05555556 0.05555556 0.0000000 0.00000000 0.00000000
I can sum columns 2-8 and get total proportion of A,B,C sold but how do I obtain proportion of A,B,C sold by Day of the week?
Thank you!
This is a pretty straightforward table
operation when combined with prop.table(...,margin=)
. the margin=
argument allows for calculating proportions for rows, columns or the whole table (default) prop.table(...,1)
does rows; 2
does columns, 3
does strata etc..
Also, instead of data.frame
, use as.data.frame.matrix
to avoid the reshape requirement:
as.data.frame.matrix(prop.table(with(df,table(Product,Day)),1))
# Friday Monday Saturday Sunday Thursday Tuesday Wednesday
#A 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.1666667 0.1666667
#B 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
#C 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000
as.data.frame.matrix(prop.table(with(df,table(Product,Day)),2))
# Friday Monday Saturday Sunday Thursday Tuesday Wednesday
#A 1 0.50 0.6666667 0 1 0.6666667 0.6666667
#B 0 0.25 0.0000000 0 0 0.3333333 0.3333333
#C 0 0.25 0.3333333 1 0 0.0000000 0.0000000
You might also want to consider making day
a factor
with the levels
in an appropriate Sunday-Saturday order.
Try this:
library(reshape2)
library(plyr)
ddply(dcast(df, Product ~ Day),1,function(u) data.frame(u[1], u[-1]/sum(u[-1])))
# Product Friday Monday Saturday Sunday Thursday Tuesday Wednesday
#1 A 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.1666667 0.1666667
#2 B 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
#3 C 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.