子集数据框和应用功能，可计算每个因子水平的频率

Question

I have a df: 我有一个df：

 df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),plot=c("1", "1", "1","2","2","2", "3","3","3","3"), interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))

I would like to subset the data by plot . 我想按plot子集数据。 For each plot subset I would like to count the frequency of each unique interact type. 对于每个plot子集，我想计算每个唯一interact类型的频率。 The output should look like: 输出应如下所示：

df<- data.frame(region= c("1", "1", "1","1", "2","2", 
"2"),plot=c("1", 
"1", "2","2", "3","3","3"), interact=c("A_B", "C_D", "E_F","C_D", 
"D_E", "C_B","A_B"), freq= c(1,2,1,2,2,1,1))

Then I would like to make a function that calculates the following for each plot subset of the df: 然后，我想制作一个函数，用于为df的每个plot子集计算以下内容：

 sum<-sum(df$freq) # Calculate sum of `freq` for each plot subset (this calculates the total number of interactions)
 prop<-unique(df$freq)/sum  #Divide each level of `freq` by the sum (this finds the proportion of each interaction type to the total number of interactions) 
 prop2<-prop^2 # Square this proportion 
 D<-sum(prop2) # Find the sum of these proportion for each plot subset
 simp<-1/D)# Use this to calculate simpsons diversity

The function I want to use is similar to that explained on the following page: http://rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html . 我要使用的功能与以下页面上介绍的功能类似： http : //rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html 。 However that referenced version is preformed on a wide dataset and my data set will be long. 但是，该参考版本是在广泛的数据集上执行的，因此我的数据集将很长。

In the end I would have a df of values for each plot: 最后，对于每个图，我将得到一个df值：

  result<- 
         Plot    div
          1      1.8
          2      1.8
          3      2.6

Answer 1

I used dplyr however result for plot3 is different and I dont know why. 我使用了dplyr但是dplyr结果不同，我也不知道为什么。 Could you provide your results for each calculations or check mine and let me know where the mistake is? 您能为每次计算提供结果还是检查我的结果，让我知道错误在哪里？

Also. 也。 If your are interested in calculating diversity indices you can get familiar with vegan package and especially diversity() function 如果您对计算多样性指数感兴趣，可以熟悉vegan包，尤其是diversity()函数

df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),
                plot=c("1", "1", "1","2","2","2", "3","3","3","3"),
                interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))

library(dplyr)

df1 <- df %>% group_by(region, plot, interact) %>% summarise(freq = n()) 
df2 <- df1 %>% group_by(plot) %>%  mutate(sum=sum(freq), prop=freq/sum, prop2 = prop^2)
df2

 A tibble: 7 x 7
# Groups:   plot [3]
  region   plot interact  freq   sum      prop     prop2
  <fctr> <fctr>   <fctr> <int> <int>     <dbl>     <dbl>
1      1      1      A_B     1     3 0.3333333 0.1111111
2      1      1      C_D     2     3 0.6666667 0.4444444
3      1      2      C_D     2     3 0.6666667 0.4444444
4      1      2      E_F     1     3 0.3333333 0.1111111
5      1      3      D_E     2     4 0.5000000 0.2500000
6      2      3      A_B     1     4 0.2500000 0.0625000
7      2      3      C_B     1     4 0.2500000 0.0625000


df2 %>% group_by(plot) %>% summarise(D=sum(prop2), simp=1/D)

 A tibble: 3 x 3
    plot         D     simp
  <fctr>     <dbl>    <dbl>
1      1 0.5555556 1.800000
2      2 0.5555556 1.800000
3      3 0.3750000 2.666667

And here is the approach using diversity() function from vegan package. 这是使用vegan包中的diversity()函数的方法。

First you need to use spread to create a "matrix" with all you interactions as separate columns 首先，您需要使用点差创建一个“矩阵”，将所有交互作为单独的列

library(vegan)
library(tidyr)
library(dplyr)

df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n())
df6 <-spread(data=df5, key = interact, value = freq, fill=0)
df6

# A tibble: 3 x 6
# Groups:   plot [3]
    plot   A_B   C_B   C_D   D_E   E_F
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
1      1     1     0     2     0     0
2      2     0     0     2     0     1
3      3     1     1     0     2     0

Than you calculate the diversity, giving as a data matrix the df6 without 1 column, which is plot. 比您计算的多样性，将不带1列的df6作为数据矩阵给出，它是图。 At the end you can add the calculated diversity as a column to the df6. 最后，您可以将计算出的分集作为一列添加到df6。

simp <-diversity(x=df6[,-1], index = "invsimpson")
df6$simp <- simp
df6

# A tibble: 3 x 7
# Groups:   plot [3]
    plot   A_B   C_B   C_D   D_E   E_F     simp
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1      1     1     0     2     0     0 1.800000
2      2     0     0     2     0     1 1.800000
3      3     1     1     0     2     0 2.666667

Or even shorter with do() and tidy() from broom package 甚至使用broom包中的do()和tidy()甚至更短

df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n())

library(broom)

df5 %>% spread(key = interact, value = freq, fill=0) %>% 
  do(tidy(diversity(x=.[,-1], index = "invsimpson")))

子集数据框和应用功能，可计算每个因子水平的频率

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-06-22 06:37:48

子集数据框和应用功能，可计算每个因子水平的频率

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-06-22 06:37:48

解决方案1
0 已采纳 2017-06-22 06:37:48