简体   繁体   English

当R中满足列条件时,使用for循环填充矩阵

[英]Using a for loop to fill out a matrix when column conditions are met in R

I learning R for a ecological study and I am trying to write a function to create multiple matrices. 我在学习R时进行了生态研究,并试图编写一个函数来创建多个矩阵。

My data frame looks like: 我的数据框如下所示:

df <- data.frame(Species = c("a", "b", "c", "a", "d", "a", "b", "c", "c", "a", "c", "b", "e"),
             Count = c(2, 3, 1, 3, 4, 1, 2, 1, 1, 3, 2, 4, 1),
             Haul = c(1, 1, 2, 2, 1, 3, 2, 3, 4, 1, 1, 2, 1),
             Year = c(2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001))

Printed: 打印:

Species Count Haul Year
1        a     2    1 2000
2        b     3    1 2000
3        c     1    2 2000
4        a     3    2 2000
5        d     4    1 2000
6        a     1    3 2000
7        b     2    2 2000
8        c     1    3 2000
9        c     1    4 2000
10       a     3    1 2001
11       c     2    1 2001
12       b     4    2 2001
13       e     1    1 2001

I am looking to create a for loop that will produce and store matrices in a list. 我正在寻找创建一个for循环,该循环将产生矩阵并将其存储在列表中。 These matrices will be be based on the Haul and species in each year. 这些矩阵将基于每年的牵引量和种类。

For example, I have been trying something like. 例如,我一直在尝试类似的东西。

for (i in sort(unique(df$Year))) {
ncol <- sort(unique(unlist(df$Species)))
nrow <- sort(unique(unlist(subset(df, Year == i, select=c("Haul")))))
mat <- matrix(0, length(nrow), length(ncol),
              dimnames = list(nrow, ncol))
mat[as.matrix(df[c("Haul", "Species")])] <- df$Count

This has not been working. 这一直没有工作。

I am looking for a solution like 我在寻找像这样的解决方案

list[[1]]
[["2000"]] a  b  c  d  e
         1 2  3  0  4  0
         2 3  2  1  0  0
         3 1  0  1  0  0
         4 0  0  1  0  0

[["2001"]] a  b  c  d  e 
         1 3  0  2  0  1  
         2 0  4  0  0  0

the goal is to have the columns be the total number of species ever seen and the rows be the specific hauls for the year. 目标是使列为曾经见过的物种总数,行为当年的特定数量。 Then the for loop will stack the matrices in a list. 然后for循环会将矩阵堆叠在列表中。

The main thing I have tried is creating a zeroed matrix and trying to fill the data with an mat[as.matrix()] function but I keep getting a subscript out of bound error. 我尝试过的主要事情是创建一个归零矩阵,并尝试使用mat[as.matrix()]函数填充数据,但是我不断使subscript out of bound错误。

I have tried a lot of methods but I am only learning from what I can find online. 我已经尝试了很多方法,但是我只能从网上找到的东西中学习。 Any help would be greatly appreciated. 任何帮助将不胜感激。 Thank you! 谢谢!

This suggestion uses tidyr::spread , though it's feasible to do with in base R using reshape . 该建议使用tidyr::spread ,尽管使用reshape在基数R中可行。

out <- by(df, df$Year, function(a) tidyr::spread(a, Species, Count, fill=0))
out
# df$Year: 2000
#   Haul Year a b c d
# 1    1 2000 2 3 0 4
# 2    2 2000 3 2 1 0
# 3    3 2000 1 0 1 0
# 4    4 2000 0 0 1 0
# -------------------------------------------------------------------------------------------- 
# df$Year: 2001
#   Haul Year a b c e
# 1    1 2001 3 0 2 1
# 2    2 2001 0 4 0 0

Technically, the output is 从技术上讲,输出是

class(out)
# [1] "by"

but that's just a glorified way of providing a by -like printing output. 但是这提供的只是一个华而不实的方式by样打印输出。 To verify: 核实:

str(out)
# List of 2
#  $ 2000:'data.frame': 4 obs. of  6 variables:
#   ..$ Haul: num [1:4] 1 2 3 4
#   ..$ Year: num [1:4] 2000 2000 2000 2000
#   ..$ a   : num [1:4] 2 3 1 0
#   ..$ b   : num [1:4] 3 2 0 0
#   ..$ c   : num [1:4] 0 1 1 1
#   ..$ d   : num [1:4] 4 0 0 0
#  $ 2001:'data.frame': 2 obs. of  6 variables:
#   ..$ Haul: num [1:2] 1 2
#   ..$ Year: num [1:2] 2001 2001
#   ..$ a   : num [1:2] 3 0
#   ..$ b   : num [1:2] 0 4
#   ..$ c   : num [1:2] 2 0
#   ..$ e   : num [1:2] 1 0
#  - attr(*, "dim")= int 2
#  - attr(*, "dimnames")=List of 1
#   ..$ df$Year: chr [1:2] "2000" "2001"
#  - attr(*, "call")= language by.data.frame(data = df, INDICES = df$Year, FUN = function(a) tidyr::spread(a, Species, Count, fill = 0))
#  - attr(*, "class")= chr "by"

So we can just override the class with 所以我们可以用

class(out) <- "list"
out
# $`2000`
#   Haul Year a b c d
# 1    1 2000 2 3 0 4
# 2    2 2000 3 2 1 0
# 3    3 2000 1 0 1 0
# 4    4 2000 0 0 1 0
# $`2001`
#   Haul Year a b c e
# 1    1 2001 3 0 2 1
# 2    2 2001 0 4 0 0
# attr(,"call")
# by.data.frame(data = df, INDICES = df$Year, FUN = function(a) tidyr::spread(a, 
#     Species, Count, fill = 0))

I kept Year in there for simplicity and demonstration (in case you might want to keep it around for some reason), but it's just as easy to remove with: 为了简化和演示,我将Year保存在此处(以防万一您可能出于某种原因希望保留它),但是使用以下方法也很容易删除它:

out <- by(df, df$Year, function(a) tidyr::spread(subset(a, select=-Year), Species, Count, fill=0))

(Since I've already brought in one of the tidyverse with tidyr , I could easily have used dplyr::select(a, -Year) instead of the subset` call. Over to you and whichever tools you are using.) (由于我已经使用tidyr tidyversetidyr中的一个,所以我可以轻松地使用dplyr::select(a, -Year) instead of the -Year dplyr::select(a, -Year) instead of the subset`调用。交给您以及您使用的任何工具。)

I admit now that this is producing data.frame s, not matrices. 我现在承认这是在产生data.frame ,而不是矩阵。 It'd take a little more code to convert the result for each one to a proper matrix. 需要更多代码才能将每个结果转换为适当的矩阵。

df2m <- function(x) {
  # assume first column should be row names
  rn <- x[[1]]
  out <- as.matrix(x[-1])
  rownames(out) <- rn
  out
}
lapply(out, df2m)
# $`2000`
#   a b c d
# 1 2 3 0 4
# 2 3 2 1 0
# 3 1 0 1 0
# 4 0 0 1 0
# $`2001`
#   a b c e
# 1 3 0 2 1
# 2 0 4 0 0

Consider by (function to split data frames by factor(s) to run processes on subsets) and table (function to build contingency table of counts by combinations of factors). 考虑“ by (按因子分解数据帧以在子集上运行进程的功能)”和“ table (按因子组合构建计数列联表的功能)。 The end result is a named list of matrices. 最终结果是矩阵的命名列表。

matrix_list <- by(df, df$Year, function(sub) {    
    mat <- table(sub$Haul, sub$Species)
    mat[as.matrix(sub[c("Haul", "Species")])] <- sub$Count

    return(mat)      
})

matrix_list$`2000`

#   a b c d e
# 1 2 3 0 4 0
# 2 3 2 1 0 0
# 3 1 0 1 0 0
# 4 0 0 1 0 0

matrix_list$`2001`

#   a b c d e
# 1 3 0 2 0 1
# 2 0 4 0 0 0

It's not clear to me why you would want to do this as a list of matrices, especially when your original data is already tidy . 我不清楚您为什么要以矩阵列表的形式执行此操作,尤其是当原始数据已经整理好时 If you're just looking to transform from long to wide data by Species, this should do it. 如果您只是想将Species从长数据转换为宽数据,则应该这样做。

library(tidyverse)

df %>% 
  #spread Species from long to wide data
  spread(key = Species, value = Count, fill = 0) %>% 
  #Make Year the first column
  select(Year, everything()) %>% 
  #sort by Year and Haul
  arrange(Year, Haul)

Year Haul a b c d e
2000    1 2 3 0 4 0
2000    2 3 2 1 0 0
2000    3 1 0 1 0 0
2000    4 0 0 1 0 0
2001    1 3 0 2 0 1
2001    2 0 4 0 0 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM