简体   繁体   English

在R中,计算二项式值的相对频率(按多列分组),并使用此“摘要”创建一个新的数据集

[英]In R, compute relative frequency of binomial values, grouped by multiple columns, and create a new dataset with this 'summary'

I have a dataset (named 'gala') that has the columns "Day", "Tree", "Trt", and "Countable". 我有一个数据集(名为“ gala”),其列为“天”,“树”,“ Trt”和“可计数”。 The data was collected over time, so each numbered tree is the same tree for each treatment is the same across all days. 数据是随着时间的推移而收集的,因此每个编号的树都是同一棵树,每种处理在整天中都是相同的。 The tree numbers are repeated for each treatment (eg there is a tree "1" for multiple treatments). 对于每个处理重复树编号(例如,对于多个处理,树“ 1”)。 I want to know the proportion/frequency of the "Countable" column values. 我想知道“可计数”列值的比例/频率。 I have converted the values in the "Countable" column to binomial ("0" and "1"). 我已经将“可计数”列中的值转换为二项式(“ 0”和“ 1”)。

I would like to compute the relative frequency of "1" vs. "0" for the 'Countable' column, for each tree per each treatment per each day (eg If I had eight 1's and two 0's, the new column value would be "0.8" to summarize with one value that tree for that treatment on that day ), and output these results into a new data frame that also includes the original day, Tree, Trt values. 我想为“可计数”列计算“ 1”与“ 0”的相对频率,对于每天每种处理的每棵树 (例如,如果我有八个1和两个0,则新列值为“ 0.8”以在该天对该处理的树的一个值进行总结 ,并将这些结果输出到一个新的数据框中,该数据框还包括原始的day,Tree和Trt值。

I have been unsuccessfully trying to make a Frankenstein of codes from other Stack Overflow answers, but I cannot get the codes to work. 我一直没有尝试从其他Stack Overflow答案中编写代码的科学怪人,但是我无法使代码正常工作。 Many people use "sum" but I do not want the sum, I would just like R to treat the "0" and "1" like categorical values and give me the relative proportion of each for each subset of data . 许多人使用“和”,但我不希望求和,我只想R将“ 0”和“ 1”像分类值一样对待,并给我每个数据子集的相对比例 If I missed this, I am sorry, and please let me know with a link to this answer. 如果我错过了,很抱歉,请通过此答案的链接通知我。 I am new to coding, and R, and do not understand well how other codes not directly relating to what I would like to do can be applied. 我是R和R的新手,我不太了解如何应用与我想做的事情不直接相关的其他代码。

It looks like dplyr is probably my best option, based on what I've seen for other similar questions. 根据我对其他类似问题的了解,dplyr似乎是我最好的选择。 This is what I have thus far, but I keep getting various errors: 到目前为止,这是我所拥有的,但是我不断收到各种错误:

library(dplyr)
RelativeFreq <-
  (gala %>%
    group_by(Day, Tree, Trt) %>%
    summarise(Countable) %>%
    mutate(rel.freq=n/length(Countable)))

I've also tried this with no success: 我也尝试过此方法,但没有成功:

RelativeFreq <- gala[,.("proportion"=frequency(Countable[0,1])), by=c("Day","Tree","Trt")]

Any help is greatly appreciated. 任何帮助是极大的赞赏。 Thank you! 谢谢!

you could use data.table: 您可以使用data.table:

# create fake data
set.seed(0)
df <- expand.grid(Day = 1:2, 
                  Tree = 1:2, 
                  Trt = 1:2)
df<- rbind(df, df, df)
library(data.table)
# make df a data.table
setDT(df)
# create fake Countable column
df[, Countable := as.integer(runif(.N) < 0.5)]
RelativeFreq <- df[, list(prop = sum(Countable)/.N), by = list(Day, Tree, Trt)]
RelativeFreq 
   Day Tree Trt      prop
1:   1    1   1 0.3333333
2:   2    1   1 0.3333333
3:   1    2   1 0.6666667
4:   2    2   1 0.6666667
5:   1    1   2 0.3333333
6:   2    1   2 0.3333333
7:   1    2   2 0.6666667
8:   2    2   2 0.0000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM