简体   繁体   中英

R: reshaping a dataframe and creating proportions

I am new to programming and would appreciate any help with this. I have a data frame that contains products names and the day when it was sold. For each product, I'd need to see the fraction that are sold in on Monday, Tuesday, Wednesday, etc.

Please follow this to replicate my dataframe:

Product=c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","C","C","C")
Day=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")
df=data.frame(cbind(Product,Day))

I tried the following:

data.frame(prop.table(with(df,table(Product,Day))))

df.wide=reshape(data.frame(prop.table(with(df,table(Product,Day)))),
                  timevar="Day",
                  idvar="Product",
                  direction="wide")

which gives me

Product Freq.Friday Freq.Monday Freq.Saturday Freq.Sunday Freq.Thursday Freq.Tuesday Freq.Wednesday
       A   0.1111111  0.11111111    0.11111111  0.00000000     0.1111111   0.11111111     0.11111111
       B   0.0000000  0.05555556    0.00000000  0.00000000     0.0000000   0.05555556     0.05555556
       C   0.0000000  0.05555556    0.05555556  0.05555556     0.0000000   0.00000000     0.00000000

I can sum columns 2-8 and get total proportion of A,B,C sold but how do I obtain proportion of A,B,C sold by Day of the week?

Thank you!

This is a pretty straightforward table operation when combined with prop.table(...,margin=) . the margin= argument allows for calculating proportions for rows, columns or the whole table (default) prop.table(...,1) does rows; 2 does columns, 3 does strata etc..

Also, instead of data.frame , use as.data.frame.matrix to avoid the reshape requirement:

as.data.frame.matrix(prop.table(with(df,table(Product,Day)),1))
#     Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday
#A 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.1666667 0.1666667
#B 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
#C 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000

as.data.frame.matrix(prop.table(with(df,table(Product,Day)),2))
#  Friday Monday  Saturday Sunday Thursday   Tuesday Wednesday
#A      1   0.50 0.6666667      0        1 0.6666667 0.6666667
#B      0   0.25 0.0000000      0        0 0.3333333 0.3333333
#C      0   0.25 0.3333333      1        0 0.0000000 0.0000000

You might also want to consider making day a factor with the levels in an appropriate Sunday-Saturday order.

Try this:

library(reshape2)
library(plyr)

ddply(dcast(df, Product ~ Day),1,function(u) data.frame(u[1], u[-1]/sum(u[-1])))

#  Product    Friday    Monday  Saturday    Sunday  Thursday   Tuesday Wednesday
#1       A 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.1666667 0.1666667
#2       B 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
#3       C 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM