简体   繁体   English

如何在R中以循环形式运行多步过程

[英]How to run a multi-step process in loop form in R

Dataframe: 数据框:

mydata<-structure(list(ParkName = c("SEP", "CSSP", 
                    "SEP", "ONF", "SEP", 
                    "ONF", "SEP", 
                    "CSSP", "ONF", 
                    "SEP", "CSSP", 
                    "PPRSP", "PPRSP", 
                    "SEP", "ONF", 
                    "PPRSP", "ONF", 
                    "SEP", "SEP", 
                    "ONF"), 
       Year = c(2001, 2005, 1998,2011, 1991, 1991, 1991, 1991, 1991, 1992, 1992, 1992, 1992, 1992,
                                      1992, 1992, 1992, 1993, 1994, 1994), 
       LatinName = c("Mola mola", "Clarias batrachus", "Lithobates catesbeianus", "Rana catesbeiana", "Rana catesbeiana", 
                     "Rana yellowis", "Rana catesbeiana", "Solenopsis sp1","Rana catesbeiana", "Rana catesbeiana",
                     "Pratensis", "Rana catesbeiana",  "Rana catesbeiana", "sp2", "Orchidaceae",
                     "Rana catesbeiana","Formica", "Rana catesbeiana", "Rana catesbeiana", "sp2"), 
       NumTotal = c(1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 100, 2, 1, 2)), Names = c("ParkName", "Year", "LatinName", 
                                                                                                                  "NumTotal"),
  row.names = c(NA, -20L), class = c("tbl_df", "tbl",  "data.frame"))

This dataset represents the abundance of different species in different parks over a multitude of years. 该数据集代表了多年以来不同公园中不同物种的丰富度。 Keep in mind this is just an example dataset and the real one is rather large. 请记住,这只是一个示例数据集,实际数据集相当大。 What I essentially want to do with this data is to get a species X park matrix for every year that data was recorded and then use the 'vegan' package to calculate diversity indices for each park for each year. 我本质上想对这些数据进行的处理是获取记录了该数据的每一年的物种X公园矩阵,然后使用“纯素”数据包来计算每个公园每年的多样性指数。

With some help from the community I've managed to create a list of dataframes based on each year. 在社区的一些帮助下,我设法根据每年创建一个数据框列表。 Then I've extracted the dataframe and converted it into a Species X park matrix. 然后,我提取了数据框并将其转换为Species X park矩阵。 I've then managed to get my diversity values of each park for that specific year. 然后,我设法获得了该特定年份每个公园的多样性值。 Below is the code I used: 以下是我使用的代码:

library(vegan)
dfList <- split(mydata, mydata$Year) #obtain dataframes for every year 
x<-data.frame(dfList[1]) #select dataframe from certain year
x2<-xtabs(x$X1991.NumTotal~x$X1991.ParkName+x$X1991.LatinName, 
data=x)#convert selected dataframe into species X site matrix
exp(diversity(x2, index = "shannon")) #extract diversity values

How would I run a loop to essentially do what I did for one year and do it for all years and end up with a list of diversity values for every park for every year? 我将如何运行以基本上完​​成一年的工作并在所有年中都做一次,最后得出每个公园每年的多样性值列表? The problem I have when I run loops is that this is a very unbalanced dataset so lengths don't end up matching up with one another. 我在运行循环时遇到的问题是,这是一个非常不平衡的数据集,因此长度最终不会彼此匹配。

A simple lapply will do what you want. 一个简单的lapply就能满足您的需求。

result <- lapply(dfList, function(x){
    x2 <- xtabs(NumTotal ~ ParkName + LatinName, data = x)
    exp(diversity(x2, index = "shannon")) #extract diversity values
})
result

Using base R 使用base R

do.call(rbind, by(mydata, mydata$Year, function(d){
  xt <- xtabs(NumTotal ~ ParkName + LatinName, data = d)
  data.frame(year = d$Year[1], park = dimnames(xt)[[1]], div = exp(diversity(xt)))}))

#            year  park      div
# 1991.CSSP  1991  CSSP 1.000000
# 1991.ONF   1991   ONF 2.000000
# 1991.SEP   1991   SEP 1.000000
# 1992.CSSP  1992  CSSP 1.000000
# 1992.ONF   1992   ONF 1.057118
# 1992.PPRSP 1992 PPRSP 1.000000
# 1992.SEP   1992   SEP 2.000000
# 1993       1993   SEP 1.000000
# 1994.ONF   1994   ONF 1.000000
# 1994.SEP   1994   SEP 1.000000
# 1998       1998   SEP 1.000000
# 2001       2001   SEP 1.000000
# 2005       2005  CSSP 1.000000
# 2011       2011   ONF 1.000000

Using data.table 使用data.table

library(data.table)
mydata[ , {xt <- xtabs(NumTotal ~ ParkName + LatinName, data = .SD)
  .(park =  dimnames(xt)[[1]], div = exp(diversity(xt)))}, by = Year]

#     Year  park      div
#  1: 2001   SEP 1.000000
#  2: 2005  CSSP 1.000000
#  3: 1998   SEP 1.000000
#  4: 2011   ONF 1.000000
#  5: 1991  CSSP 1.000000
#  6: 1991   ONF 2.000000
#  7: 1991   SEP 1.000000
#  8: 1992  CSSP 1.000000
#  9: 1992   ONF 1.057118
# 10: 1992 PPRSP 1.000000
# 11: 1992   SEP 2.000000
# 12: 1993   SEP 1.000000
# 13: 1994   ONF 1.000000
# 14: 1994   SEP 1.000000

Note that by retains row order within groups, as well as order among groups. 请注意, by保留组内的行顺序以及组间的顺序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM