简体   繁体   中英

for loop regression analysis in R

I have a dataset of fish abundance data on which i want to perform a regression analysis. However, i want to perform a lot of regressions on different subsets of the data,without having to do this manually, and save the coefs and P value in a new data frame.

Data is structured as follows (example):

site year species abund
a    2011  a      3
a    2016  b      5
b    2011  a      4
b    2015  a      9
a    2018  b      1
c    2010  a      2
b    2016  c      3
c    2012  a      1

In total i have 883 rows, 21 unique sites, 41 unique species and 8 different years.

I want a regression model for every species-site combination. (every combination has at least 5 observations) Model looks like:

lm(abund ~ year)  

but one model for every species for every site. So one model for species a in site a, one for species b in site a, one for species a in site b etc.

There are several topics on this on stack, but non seem to fit my needs. My idea was to use a for loop, but i can't get this to work properly. Have been tweeking all day, but can't get it to work.

slopes <- numeric(nrow(df))

for (i in 1:nrow(df)) {
  y <- as.numeric(df[i,4]) # row 4 is the abundancy data
  x <- df([i, 1]) # row 1 is the year data
  slopes[i] <- coef(lm(y ~ x))[2]
}

My question: How can i conduct linear regression models for all unique site-species combinations and store the coefs and P value in a new dataframe? Preferably by using an improved version of my failed attempt.

Thanks in advance

dput of sample of 50 random rows:

df <- structure(list(year = c(2015, 2012, 2017, 2014, 2018, 2018, 2012, 
    2018, 2013, 2013, 2012, 2018, 2012, 2018, 2016, 2013, 2018, 2019, 
    2012, 2019, 2017, 2014, 2013, 2014, 2018, 2016, 2013, 2019, 2019, 
    2018, 2019, 2014, 2012, 2018, 2017, 2016, 2017, 2015, 2017, 2019, 
    2012, 2016, 2019, 2019, 2018, 2014, 2012, 2015, 2012, 2012), 
        species = c("Aal", "Brasem", "Kolblei", "Dunlipharder", "Snoekbaars", 
        "Snoekbaars", "Paling", "Baars", "Tong", "Sprot", "Paling", 
        "Kolblei", "Baars", "Sprot", "Tong", "Baars", "Baars", "Zwartbekgrondel", 
        "Dikkopje", "Snoekbaars", "Blankvoorn", "Kolblei", "Kolblei", 
        "Baars", "Aal", "Kolblei", "Bot", "Snoekbaars", "Baars", 
        "Blankvoorn", "Zeebaars", "Snoekbaars", "Zeebaars", "Bot", 
        "Snoekbaars", "Bot", "Baars", "Baars", "Aal", "Snoekbaars", 
        "Baars", "Baars", "Bot", "Bot", "Bot", "Kleine koornaarvis", 
        "Snoekbaars", "Bot", "Blankvoorn", "Kleine koornaarvis"), 
        site = c("Amerikahaven (kop Aziëhaven)", "Het IJ (thv EyE)", 
        "Westhaven en ADM-haven (kop)", "Jan van Riebeeckhaven (thv Nuon)", 
        "Coenhaven", "Mercuriushaven", "Amerikahaven (kop Aziëhaven)", 
        "Mercuriushaven", "Amerikahaven kop Australiëhaven", "Het IJ (thv het Keerkringpark)", 
        "Amerikahaven kop Australiëhaven", "Coenhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)", "Het IJ (thv het Keerkringpark)", 
        "Jan van Riebeeckhaven (kop NZK)", "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)", 
        "Het IJ (thv EyE)", "Het IJ (thv EyE)", "Het IJ (thv het Keerkringpark)", 
        "Westhaven en ADM-haven (kop)", "Het IJ (kop Noordhollandsch kanaal)", 
        "Amerikahaven kop Australiëhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Amerikahaven (kop Aziëhaven)", "Jan van Riebeeckhaven (kop NZK)", 
        "Westhaven en ADM-haven (kop)", "Jan van Riebeeckhaven (thv Nuon)", 
        "Amerikahaven kop Australiëhaven", "Het IJ (thv EyE)", "Amerikahaven (kop Aziëhaven)", 
        "Mercuriushaven", "Westhaven en ADM-haven (kop)", "Amerikahaven kop Australiëhaven", 
        "Minervahaven", "Westhaven en ADM-haven (kop)", "Westhaven en ADM-haven (kop)", 
        "Amerikahaven kop Australiëhaven", "Amerikahaven (kop Aziëhaven)", 
        "Amerikahaven kop Australiëhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Westhaven en ADM-haven (kop)", "Petroleumhaven", "Westhaven en ADM-haven (kop)", 
        "Het IJ (thv EyE)", "Het IJ (kop Noordhollandsch kanaal)", 
        "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)"), abund = c(5, 
        25, 2, 15, 3, 4, 1, 176, 4, 1, 1, 4, 55, 1, 1, 37, 75, 11, 
        1, 121, 4, 2, 2, 412, 38, 1, 5, 2, 443, 2, 6, 12, 1, 10, 
        33, 14, 120, 377, 67, 29, 43, 524, 4, 31, 18, 5, 18, 1, 9, 
        31), n = c(6L, 4L, 4L, 7L, 3L, 3L, 3L, 3L, 7L, 4L, 5L, 3L, 
        8L, 4L, 8L, 7L, 8L, 6L, 4L, 7L, 7L, 4L, 4L, 7L, 8L, 6L, 5L, 
        8L, 8L, 5L, 6L, 7L, 3L, 3L, 8L, 8L, 3L, 8L, 8L, 8L, 6L, 8L, 
        7L, 8L, 3L, 4L, 7L, 3L, 7L, 4L)), row.names = c(NA, -50L), class = c("tbl_df", 
    "tbl", "data.frame"), na.action = structure(c(`47` = 47L, `52` = 52L, 
    `60` = 60L, `88` = 88L, `128` = 128L, `401` = 401L, `488` = 488L, 
    `593` = 593L, `633` = 633L), class = "omit"))
all_sites <- unique(df$site)
all_species <- unique(df$species)

slopes <- expand.grid(all_sites, all_species)
names(slopes) <- c('site','species')
slopes$coef <- NA_real_
  
for (site in all_sites) {
  for (species in all_species){
    this_subset <- df[(df$site==site & df$species==species),]
    if (nrow(this_subset)<2) next;
    y <- this_subset$abund
    x <- this_subset$year
    cat('\n',site,species,nrow(this_subset),sum(is.na(x)),sum(is.na(y)))
    slopes[slopes$site==site & slopes$species==species, ]$coef <- coef(lm(y ~ x))[2]
  }
}

Here's some simpler code:

for(site in df$site){
  for(species in df$species){
    assign(paste0('model_', site, species), lm(abund~year, data=df[df$site==site & df$species==species,]))
  }
}

As mentioned in Vasily A's answer, you need to ensure you have more than one observation for each regression model being built.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM