简体   繁体   English

子集数据框和应用功能,可计算每个因子水平的频率

[英]Subset dataframe and apply function that counts frequency of each factor level

I have a df: 我有一个df:

 df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),plot=c("1", "1", "1","2","2","2", "3","3","3","3"), interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))

I would like to subset the data by plot . 我想按plot子集数据。 For each plot subset I would like to count the frequency of each unique interact type. 对于每个plot子集,我想计算每个唯一interact类型的频率。 The output should look like: 输出应如下所示:

df<- data.frame(region= c("1", "1", "1","1", "2","2", 
"2"),plot=c("1", 
"1", "2","2", "3","3","3"), interact=c("A_B", "C_D", "E_F","C_D", 
"D_E", "C_B","A_B"), freq= c(1,2,1,2,2,1,1))

Then I would like to make a function that calculates the following for each plot subset of the df: 然后,我想制作一个函数,用于为df的每个plot子集计算以下内容:

 sum<-sum(df$freq) # Calculate sum of `freq` for each plot subset (this calculates the total number of interactions)
 prop<-unique(df$freq)/sum  #Divide each level of `freq` by the sum (this finds the proportion of each interaction type to the total number of interactions) 
 prop2<-prop^2 # Square this proportion 
 D<-sum(prop2) # Find the sum of these proportion for each plot subset
 simp<-1/D)# Use this to calculate simpsons diversity

The function I want to use is similar to that explained on the following page: http://rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html . 我要使用的功能与以下页面上介绍的功能类似: http : //rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html However that referenced version is preformed on a wide dataset and my data set will be long. 但是,该参考版本是在广泛的数据集上执行的,因此我的数据集将很长。

In the end I would have a df of values for each plot: 最后,对于每个图,我将得到一个df值:

  result<- 
         Plot    div
          1      1.8
          2      1.8
          3      2.6

I used dplyr however result for plot3 is different and I dont know why. 我使用了dplyr但是dplyr结果不同,我也不知道为什么。 Could you provide your results for each calculations or check mine and let me know where the mistake is? 您能为每次计算提供结果还是检查我的结果,让我知道错误在哪里?

Also. 也。 If your are interested in calculating diversity indices you can get familiar with vegan package and especially diversity() function 如果您对计算多样性指数感兴趣,可以熟悉vegan包,尤其是diversity()函数

df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),
                plot=c("1", "1", "1","2","2","2", "3","3","3","3"),
                interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B"))

library(dplyr)

df1 <- df %>% group_by(region, plot, interact) %>% summarise(freq = n()) 
df2 <- df1 %>% group_by(plot) %>%  mutate(sum=sum(freq), prop=freq/sum, prop2 = prop^2)
df2

 A tibble: 7 x 7
# Groups:   plot [3]
  region   plot interact  freq   sum      prop     prop2
  <fctr> <fctr>   <fctr> <int> <int>     <dbl>     <dbl>
1      1      1      A_B     1     3 0.3333333 0.1111111
2      1      1      C_D     2     3 0.6666667 0.4444444
3      1      2      C_D     2     3 0.6666667 0.4444444
4      1      2      E_F     1     3 0.3333333 0.1111111
5      1      3      D_E     2     4 0.5000000 0.2500000
6      2      3      A_B     1     4 0.2500000 0.0625000
7      2      3      C_B     1     4 0.2500000 0.0625000


df2 %>% group_by(plot) %>% summarise(D=sum(prop2), simp=1/D)

 A tibble: 3 x 3
    plot         D     simp
  <fctr>     <dbl>    <dbl>
1      1 0.5555556 1.800000
2      2 0.5555556 1.800000
3      3 0.3750000 2.666667

And here is the approach using diversity() function from vegan package. 这是使用vegan包中的diversity()函数的方法。

First you need to use spread to create a "matrix" with all you interactions as separate columns 首先,您需要使用点差创建一个“矩阵”,将所有交互作为单独的列

library(vegan)
library(tidyr)
library(dplyr)

df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n())
df6 <-spread(data=df5, key = interact, value = freq, fill=0)
df6

# A tibble: 3 x 6
# Groups:   plot [3]
    plot   A_B   C_B   C_D   D_E   E_F
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
1      1     1     0     2     0     0
2      2     0     0     2     0     1
3      3     1     1     0     2     0

Than you calculate the diversity, giving as a data matrix the df6 without 1 column, which is plot. 比您计算的多样性,将不带1列的df6作为数据矩阵给出,它是图。 At the end you can add the calculated diversity as a column to the df6. 最后,您可以将计算出的分集作为一列添加到df6。

simp <-diversity(x=df6[,-1], index = "invsimpson")
df6$simp <- simp
df6

# A tibble: 3 x 7
# Groups:   plot [3]
    plot   A_B   C_B   C_D   D_E   E_F     simp
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1      1     1     0     2     0     0 1.800000
2      2     0     0     2     0     1 1.800000
3      3     1     1     0     2     0 2.666667

Or even shorter with do() and tidy() from broom package 甚至使用broom包中的do()tidy()甚至更短

df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n())

library(broom)

df5 %>% spread(key = interact, value = freq, fill=0) %>% 
  do(tidy(diversity(x=.[,-1], index = "invsimpson")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为每个因子水平级别添加频率计数列,并调整数据框的形状 - Add columns of frequency counts for each level of factor level and reshape dataframe 通过删除因子的每个级别的重复项来子集数据帧 - subset dataframe by removing duplicates for each level of a factor 将一个非常特殊的函数应用于 R 中数据帧中因子的每个级别 - apply a very particular function to a each level of a factor in a dataframe in R 如何为因子变量的每个级别应用函数? - How to apply a function for each level of a factor variable? 在数据框的每个子集中应用功能 - Apply function within each subset of a dataframe 子集数据框并应用函数将值转换为因子十分位数范围 - Subset dataframe and apply a function to convert values to factor decile ranges 根据时间频率将特定功能应用于数据帧的特定子集 - Apply a specific function to a certain subset of a dataframe based on time frequency 将功能(统计检验)应用于每个因子级别的数据子集 - Apply function (stat test) to subsets of data for each factor level 将函数应用于数据框列表中的每个因子级别 - Apply a function to each factor level in a list of data frames 将自定义函数应用于数据框的每个子集并生成数据帧 - Apply custom function to each subset of a data frame and result a dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM