简体   繁体   中英

In R, compute relative frequency of binomial values, grouped by multiple columns, and create a new dataset with this 'summary'

I have a dataset (named 'gala') that has the columns "Day", "Tree", "Trt", and "Countable". The data was collected over time, so each numbered tree is the same tree for each treatment is the same across all days. The tree numbers are repeated for each treatment (eg there is a tree "1" for multiple treatments). I want to know the proportion/frequency of the "Countable" column values. I have converted the values in the "Countable" column to binomial ("0" and "1").

I would like to compute the relative frequency of "1" vs. "0" for the 'Countable' column, for each tree per each treatment per each day (eg If I had eight 1's and two 0's, the new column value would be "0.8" to summarize with one value that tree for that treatment on that day ), and output these results into a new data frame that also includes the original day, Tree, Trt values.

I have been unsuccessfully trying to make a Frankenstein of codes from other Stack Overflow answers, but I cannot get the codes to work. Many people use "sum" but I do not want the sum, I would just like R to treat the "0" and "1" like categorical values and give me the relative proportion of each for each subset of data . If I missed this, I am sorry, and please let me know with a link to this answer. I am new to coding, and R, and do not understand well how other codes not directly relating to what I would like to do can be applied.

It looks like dplyr is probably my best option, based on what I've seen for other similar questions. This is what I have thus far, but I keep getting various errors:

library(dplyr)
RelativeFreq <-
  (gala %>%
    group_by(Day, Tree, Trt) %>%
    summarise(Countable) %>%
    mutate(rel.freq=n/length(Countable)))

I've also tried this with no success:

RelativeFreq <- gala[,.("proportion"=frequency(Countable[0,1])), by=c("Day","Tree","Trt")]

Any help is greatly appreciated. Thank you!

you could use data.table:

# create fake data
set.seed(0)
df <- expand.grid(Day = 1:2, 
                  Tree = 1:2, 
                  Trt = 1:2)
df<- rbind(df, df, df)
library(data.table)
# make df a data.table
setDT(df)
# create fake Countable column
df[, Countable := as.integer(runif(.N) < 0.5)]
RelativeFreq <- df[, list(prop = sum(Countable)/.N), by = list(Day, Tree, Trt)]
RelativeFreq 
   Day Tree Trt      prop
1:   1    1   1 0.3333333
2:   2    1   1 0.3333333
3:   1    2   1 0.6666667
4:   2    2   1 0.6666667
5:   1    1   2 0.3333333
6:   2    1   2 0.3333333
7:   1    2   2 0.6666667
8:   2    2   2 0.0000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM