简体   繁体   中英

R: Modifying Subsets of Dataframe using Calculations on that Subset

I am going to ask my question through example, because I don't know what the best way to phrase it in general is. Using the ChickWeight dataset built into R:

> head(ChickWeight)
    weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1
> tail(ChickWeight)
      weight Time Chick Diet
573    155   12    50    4
574    175   14    50    4
575    205   16    50    4
576    234   18    50    4
577    264   20    50    4
578    264   21    50    4

I can use ddply to calculate mean for each unique Diet, for example

> ddply(d, .(Diet), summarise, mean_weight=mean(weight, na.rm=TRUE))
  Diet   mean_weight
1    1 102.6455
2    2 122.6167
3    3 142.9500
4    4 135.2627

What do I do if I wanted to easily create a data frame that modifies the 'weight' column in ChickWeight by dividing it by the mean_weight of it's corresponding diet?

A solution with data.table that's short, fast and readable:

library(data.table)
cw <- data.table(ChickWeight)
cw[, pct_mw_diet:=weight/mean(weight, na.rm=T), by=Diet]

Now you have a column with percent of mean weight by diet

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM