简体   繁体   English

for循环回归分析R

[英]for loop regression analysis in R

I have a dataset of fish abundance data on which i want to perform a regression analysis.我有一个鱼类丰度数据集,我想对其进行回归分析。 However, i want to perform a lot of regressions on different subsets of the data,without having to do this manually, and save the coefs and P value in a new data frame.但是,我想对不同的数据子集执行大量回归,而无需手动执行此操作,并将系数和 P 值保存在新的数据框中。

Data is structured as follows (example):数据结构如下(示例):

site year species abund
a    2011  a      3
a    2016  b      5
b    2011  a      4
b    2015  a      9
a    2018  b      1
c    2010  a      2
b    2016  c      3
c    2012  a      1

In total i have 883 rows, 21 unique sites, 41 unique species and 8 different years.我总共有 883 行,21 个独特的地点,41 个独特的物种和 8 个不同的年份。

I want a regression model for every species-site combination.我想要每个物种-站点组合的回归 model。 (every combination has at least 5 observations) Model looks like: (每个组合至少有 5 个观察值)Model 看起来像:

lm(abund ~ year)  

but one model for every species for every site.但是每个站点的每个物种都有一个 model。 So one model for species a in site a, one for species b in site a, one for species a in site b etc.因此,一个 model 用于站点 a 中的物种 a,一个用于站点 a 中的物种 b,一个用于站点 b 中的物种 a 等。

There are several topics on this on stack, but non seem to fit my needs.堆栈上有几个关于此的主题,但似乎不符合我的需要。 My idea was to use a for loop, but i can't get this to work properly.我的想法是使用 for 循环,但我无法让它正常工作。 Have been tweeking all day, but can't get it to work.整天都在 tweeking,但无法正常工作。

slopes <- numeric(nrow(df))

for (i in 1:nrow(df)) {
  y <- as.numeric(df[i,4]) # row 4 is the abundancy data
  x <- df([i, 1]) # row 1 is the year data
  slopes[i] <- coef(lm(y ~ x))[2]
}

My question: How can i conduct linear regression models for all unique site-species combinations and store the coefs and P value in a new dataframe?我的问题:如何对所有独特的地点-物种组合进行线性回归模型并将系数和 P 值存储在新的 dataframe 中? Preferably by using an improved version of my failed attempt.最好使用我失败尝试的改进版本。

Thanks in advance提前致谢

dput of sample of 50 random rows: 50 个随机行样本的输出:

df <- structure(list(year = c(2015, 2012, 2017, 2014, 2018, 2018, 2012, 
    2018, 2013, 2013, 2012, 2018, 2012, 2018, 2016, 2013, 2018, 2019, 
    2012, 2019, 2017, 2014, 2013, 2014, 2018, 2016, 2013, 2019, 2019, 
    2018, 2019, 2014, 2012, 2018, 2017, 2016, 2017, 2015, 2017, 2019, 
    2012, 2016, 2019, 2019, 2018, 2014, 2012, 2015, 2012, 2012), 
        species = c("Aal", "Brasem", "Kolblei", "Dunlipharder", "Snoekbaars", 
        "Snoekbaars", "Paling", "Baars", "Tong", "Sprot", "Paling", 
        "Kolblei", "Baars", "Sprot", "Tong", "Baars", "Baars", "Zwartbekgrondel", 
        "Dikkopje", "Snoekbaars", "Blankvoorn", "Kolblei", "Kolblei", 
        "Baars", "Aal", "Kolblei", "Bot", "Snoekbaars", "Baars", 
        "Blankvoorn", "Zeebaars", "Snoekbaars", "Zeebaars", "Bot", 
        "Snoekbaars", "Bot", "Baars", "Baars", "Aal", "Snoekbaars", 
        "Baars", "Baars", "Bot", "Bot", "Bot", "Kleine koornaarvis", 
        "Snoekbaars", "Bot", "Blankvoorn", "Kleine koornaarvis"), 
        site = c("Amerikahaven (kop Aziëhaven)", "Het IJ (thv EyE)", 
        "Westhaven en ADM-haven (kop)", "Jan van Riebeeckhaven (thv Nuon)", 
        "Coenhaven", "Mercuriushaven", "Amerikahaven (kop Aziëhaven)", 
        "Mercuriushaven", "Amerikahaven kop Australiëhaven", "Het IJ (thv het Keerkringpark)", 
        "Amerikahaven kop Australiëhaven", "Coenhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)", "Het IJ (thv het Keerkringpark)", 
        "Jan van Riebeeckhaven (kop NZK)", "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)", 
        "Het IJ (thv EyE)", "Het IJ (thv EyE)", "Het IJ (thv het Keerkringpark)", 
        "Westhaven en ADM-haven (kop)", "Het IJ (kop Noordhollandsch kanaal)", 
        "Amerikahaven kop Australiëhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Amerikahaven (kop Aziëhaven)", "Jan van Riebeeckhaven (kop NZK)", 
        "Westhaven en ADM-haven (kop)", "Jan van Riebeeckhaven (thv Nuon)", 
        "Amerikahaven kop Australiëhaven", "Het IJ (thv EyE)", "Amerikahaven (kop Aziëhaven)", 
        "Mercuriushaven", "Westhaven en ADM-haven (kop)", "Amerikahaven kop Australiëhaven", 
        "Minervahaven", "Westhaven en ADM-haven (kop)", "Westhaven en ADM-haven (kop)", 
        "Amerikahaven kop Australiëhaven", "Amerikahaven (kop Aziëhaven)", 
        "Amerikahaven kop Australiëhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Westhaven en ADM-haven (kop)", "Petroleumhaven", "Westhaven en ADM-haven (kop)", 
        "Het IJ (thv EyE)", "Het IJ (kop Noordhollandsch kanaal)", 
        "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)"), abund = c(5, 
        25, 2, 15, 3, 4, 1, 176, 4, 1, 1, 4, 55, 1, 1, 37, 75, 11, 
        1, 121, 4, 2, 2, 412, 38, 1, 5, 2, 443, 2, 6, 12, 1, 10, 
        33, 14, 120, 377, 67, 29, 43, 524, 4, 31, 18, 5, 18, 1, 9, 
        31), n = c(6L, 4L, 4L, 7L, 3L, 3L, 3L, 3L, 7L, 4L, 5L, 3L, 
        8L, 4L, 8L, 7L, 8L, 6L, 4L, 7L, 7L, 4L, 4L, 7L, 8L, 6L, 5L, 
        8L, 8L, 5L, 6L, 7L, 3L, 3L, 8L, 8L, 3L, 8L, 8L, 8L, 6L, 8L, 
        7L, 8L, 3L, 4L, 7L, 3L, 7L, 4L)), row.names = c(NA, -50L), class = c("tbl_df", 
    "tbl", "data.frame"), na.action = structure(c(`47` = 47L, `52` = 52L, 
    `60` = 60L, `88` = 88L, `128` = 128L, `401` = 401L, `488` = 488L, 
    `593` = 593L, `633` = 633L), class = "omit"))
all_sites <- unique(df$site)
all_species <- unique(df$species)

slopes <- expand.grid(all_sites, all_species)
names(slopes) <- c('site','species')
slopes$coef <- NA_real_
  
for (site in all_sites) {
  for (species in all_species){
    this_subset <- df[(df$site==site & df$species==species),]
    if (nrow(this_subset)<2) next;
    y <- this_subset$abund
    x <- this_subset$year
    cat('\n',site,species,nrow(this_subset),sum(is.na(x)),sum(is.na(y)))
    slopes[slopes$site==site & slopes$species==species, ]$coef <- coef(lm(y ~ x))[2]
  }
}

Here's some simpler code:这是一些更简单的代码:

for(site in df$site){
  for(species in df$species){
    assign(paste0('model_', site, species), lm(abund~year, data=df[df$site==site & df$species==species,]))
  }
}

As mentioned in Vasily A's answer, you need to ensure you have more than one observation for each regression model being built.正如 Vasily A 的回答中提到的,您需要确保对正在构建的每个回归 model 有不止一个观察。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM