简体   繁体   English

将组除以R中每组中的样本数

[英]Divide Groups by # of samples in each group in R

I'm trying to use ggplot to make a graph that has the composition of substrates at 6 different sites and at 7 different times. 我正在尝试使用ggplot制作一个图表,该图表具有6个不同位置和7个不同时间的底物组成。 The problem is I have different amount of samples for each sampling period and site. 问题是我每个采样周期和站点都有不同的样本量。 I essentially want the code y=freq/(#of stations in that time period) . 我基本上想要代码y=freq/(#of stations in that time period) The following is a sample of my data set 以下是我的数据集的示例

   Substrate     Time   Site Freq
1      Floc    July 11   P1    4
2      Fine    July 11   P1    2
3    Medium    July 11   P1   12
4    Coarse    July 11   P1    0
5   Bedrock    July 11   P1    3
6      Floc     Aug 11   P1    7
7      Fine     Aug 11   P1    1
8    Medium     Aug 11   P1    7
9    Coarse     Aug 11   P1    1
10  Bedrock     Aug 11   P1    4

Therefore I want 所以我想要

      Var1       Var2 Var3 Freq
1      Floc    July 11   P1    4/(21 - The number of samples taken in July).

Any ideas on how to write this code and then plot the results? 有关如何编写此代码然后绘制结果的任何想法?

With a data.table (from the package of the same name)... 使用data.table(来自同名的包)...

require(data.table)
DT <- data.table(dat)

DT[,Freq2:=Freq/sum(Freq),by=Var2]

which gives 这使

       Var1    Var2 Var3 Freq     Freq2
 1:    Floc July 11   P1    4 0.1904762
 2:    Fine July 11   P1    2 0.0952381
 3:  Medium July 11   P1   12 0.5714286
 4:  Coarse July 11   P1    0 0.0000000
 5: Bedrock July 11   P1    3 0.1428571
 6:    Floc  Aug 11   P1    7 0.3500000
 7:    Fine  Aug 11   P1    1 0.0500000
 8:  Medium  Aug 11   P1    7 0.3500000
 9:  Coarse  Aug 11   P1    1 0.0500000
10: Bedrock  Aug 11   P1    4 0.2000000

EDIT: The question now has better column names, so it's clearer what "for...period and site" means. 编辑:现在的问题有更好的列名,所以更明确的是“for ... period and site”的含义。 As @DWin wrote in the comments, the answer now is: 正如@DWin在评论中写道,现在的答案是:

DT[,Freq2:=Freq/sum(Freq),by='Time,Site']

Have a look at ?ave : 看看?ave

df <- read.table(textConnection("
Var0 Var1       Var2 Var3 Freq
1      Floc    July 11   P1    4
2      Fine    July 11   P1    2
3    Medium    July 11   P1   12
4    Coarse    July 11   P1    0
5   Bedrock    July 11   P1    3
6      Floc     Aug 11   P1    7
7      Fine     Aug 11   P1    1
8    Medium     Aug 11   P1    7
9    Coarse     Aug 11   P1    1
10  Bedrock     Aug 11   P1    4"), header=TRUE, row.names=1)

df$freq <- ave(df$Freq, df$Var1, FUN=function(x)x/sum(x))
df
#      Var0 Var1 Var2 Var3 Freq      freq
#1     Floc July   11   P1    4 0.1904762
#2     Fine July   11   P1    2 0.0952381
#3   Medium July   11   P1   12 0.5714286
#4   Coarse July   11   P1    0 0.0000000
#5  Bedrock July   11   P1    3 0.1428571
#6     Floc  Aug   11   P1    7 0.3500000
#7     Fine  Aug   11   P1    1 0.0500000
#8   Medium  Aug   11   P1    7 0.3500000
#9   Coarse  Aug   11   P1    1 0.0500000
#10 Bedrock  Aug   11   P1    4 0.2000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM