[英]Subset dataframe and apply function that counts frequency of each factor level
I have a df: 我有一个df:
df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),plot=c("1", "1", "1","2","2","2", "3","3","3","3"), interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))
I would like to subset the data by plot
. 我想按
plot
子集数据。 For each plot
subset I would like to count the frequency of each unique interact
type. 对于每个
plot
子集,我想计算每个唯一interact
类型的频率。 The output should look like: 输出应如下所示:
df<- data.frame(region= c("1", "1", "1","1", "2","2",
"2"),plot=c("1",
"1", "2","2", "3","3","3"), interact=c("A_B", "C_D", "E_F","C_D",
"D_E", "C_B","A_B"), freq= c(1,2,1,2,2,1,1))
Then I would like to make a function that calculates the following for each plot
subset of the df: 然后,我想制作一个函数,用于为df的每个
plot
子集计算以下内容:
sum<-sum(df$freq) # Calculate sum of `freq` for each plot subset (this calculates the total number of interactions)
prop<-unique(df$freq)/sum #Divide each level of `freq` by the sum (this finds the proportion of each interaction type to the total number of interactions)
prop2<-prop^2 # Square this proportion
D<-sum(prop2) # Find the sum of these proportion for each plot subset
simp<-1/D)# Use this to calculate simpsons diversity
The function I want to use is similar to that explained on the following page: http://rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html . 我要使用的功能与以下页面上介绍的功能类似: http : //rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html 。 However that referenced version is preformed on a wide dataset and my data set will be long.
但是,该参考版本是在广泛的数据集上执行的,因此我的数据集将很长。
In the end I would have a df of values for each plot: 最后,对于每个图,我将得到一个df值:
result<-
Plot div
1 1.8
2 1.8
3 2.6
I used dplyr
however result for plot3 is different and I dont know why. 我使用了
dplyr
但是dplyr
结果不同,我也不知道为什么。 Could you provide your results for each calculations or check mine and let me know where the mistake is? 您能为每次计算提供结果还是检查我的结果,让我知道错误在哪里?
Also. 也。 If your are interested in calculating diversity indices you can get familiar with
vegan
package and especially diversity()
function 如果您对计算多样性指数感兴趣,可以熟悉
vegan
包,尤其是diversity()
函数
df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),
plot=c("1", "1", "1","2","2","2", "3","3","3","3"),
interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))
library(dplyr)
df1 <- df %>% group_by(region, plot, interact) %>% summarise(freq = n())
df2 <- df1 %>% group_by(plot) %>% mutate(sum=sum(freq), prop=freq/sum, prop2 = prop^2)
df2
A tibble: 7 x 7
# Groups: plot [3]
region plot interact freq sum prop prop2
<fctr> <fctr> <fctr> <int> <int> <dbl> <dbl>
1 1 1 A_B 1 3 0.3333333 0.1111111
2 1 1 C_D 2 3 0.6666667 0.4444444
3 1 2 C_D 2 3 0.6666667 0.4444444
4 1 2 E_F 1 3 0.3333333 0.1111111
5 1 3 D_E 2 4 0.5000000 0.2500000
6 2 3 A_B 1 4 0.2500000 0.0625000
7 2 3 C_B 1 4 0.2500000 0.0625000
df2 %>% group_by(plot) %>% summarise(D=sum(prop2), simp=1/D)
A tibble: 3 x 3
plot D simp
<fctr> <dbl> <dbl>
1 1 0.5555556 1.800000
2 2 0.5555556 1.800000
3 3 0.3750000 2.666667
And here is the approach using diversity()
function from vegan
package. 这是使用
vegan
包中的diversity()
函数的方法。
First you need to use spread to create a "matrix" with all you interactions as separate columns 首先,您需要使用点差创建一个“矩阵”,将所有交互作为单独的列
library(vegan)
library(tidyr)
library(dplyr)
df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n())
df6 <-spread(data=df5, key = interact, value = freq, fill=0)
df6
# A tibble: 3 x 6
# Groups: plot [3]
plot A_B C_B C_D D_E E_F
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 2 0 0
2 2 0 0 2 0 1
3 3 1 1 0 2 0
Than you calculate the diversity, giving as a data matrix the df6 without 1 column, which is plot. 比您计算的多样性,将不带1列的df6作为数据矩阵给出,它是图。 At the end you can add the calculated diversity as a column to the df6.
最后,您可以将计算出的分集作为一列添加到df6。
simp <-diversity(x=df6[,-1], index = "invsimpson")
df6$simp <- simp
df6
# A tibble: 3 x 7
# Groups: plot [3]
plot A_B C_B C_D D_E E_F simp
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 2 0 0 1.800000
2 2 0 0 2 0 1 1.800000
3 3 1 1 0 2 0 2.666667
Or even shorter with do()
and tidy()
from broom
package 甚至使用
broom
包中的do()
和tidy()
甚至更短
df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n())
library(broom)
df5 %>% spread(key = interact, value = freq, fill=0) %>%
do(tidy(diversity(x=.[,-1], index = "invsimpson")))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.