简体   繁体   English

在 R 中,如何对由两个字符列聚合的 data.table 列中的值求和,其中列名和行名等于字符串 output 的矩阵?

[英]In R how do I sum values in a data.table column aggregated by two character columns, with matrix with colnames and rownames equal to strings output?

I have a large.csv file containing the results of recent large-scale forest surveys, in which each row contains a given individual tree's location, species identity, and measured cross-sectional area.我有一个 large.csv 文件,其中包含最近大规模森林调查的结果,其中每一行都包含给定的单个树的位置、物种身份和测量的横截面积。 I read this.csv into RStudio using fread() to produce a data.table .我使用fread()将 this.csv 读入 RStudio 以产生data.table I want to collapse this large data.table into a matrix such that each row corresponds with a location, each column corresponds with a single species, and each cell contains the sum of all cross-sectional areas of that species at that location.我想把这个大的data.table成一个matrix ,使得每一行对应一个位置,每一列对应一个物种,每个单元格包含该物种在该位置的所有横截面积的总和。

Below is a dummy data.table in the format of my data, as copied from the console.下面是从控制台复制的我的数据格式的虚拟data.table Values in cells are summed values from column x-sect area in raw.input .单元格中的值是raw.inputx-sect area列的总和值。

> raw.input <- fread("raw_input.csv")
> raw.input
      site  sp x-sect area
1: hilltop sp2          10
2: hilltop sp1           3
3: hilltop sp1           5
4: hilltop sp1           4
5: hilltop sp1           3
6:  stream sp3          45
7:  stream sp3          50
8:  stream sp1           4

Below is a matrix in my desired format, generated as a.csv is MS Excel, read in using fread() , and converted to a matrix in RStudio.下面是我想要的格式的matrix ,生成为 a.csv 是 MS Excel,使用fread()读取,并转换为 RStudio 中的matrix

> mtrx.tmp <- fread("mtrx_final.csv")
> mtrx <- as.matrix(mtrx.tmp[,2:4]) #remove character strings so matrix is numeric
> row.names(mtrx) <- mtrx.tmp$site  #mtrx.tmp$site is equivalent to mtrx.tmp[,1] in content
> mtrx
        sp1 sp2 sp3
hilltop  15  10   0
stream    4   0  95

If a data.table is an inappropriate/inefficient format in which to read in this data set please do include that in your answer.如果data.table是读取此数据集的不适当/低效格式,请务必将其包含在您的答案中。

You can use dcast from data.table for that (and data.table is perfectly suited for this task):您可以为此使用dcastdata.table (并且data.table非常适合此任务):

library(data.table)
    
raw.input <- structure(list(site = c("hilltop", "hilltop", "hilltop", "hilltop", 
"hilltop", "stream", "stream", "stream"), sp = c("sp2", "sp1", 
"sp1", "sp1", "sp1", "sp3", "sp3", "sp1"), `x-sect area` = c(10L, 
3L, 5L, 4L, 3L, 45L, 50L, 4L)), row.names = c(NA, -8L), class = c("data.table", 
"data.frame"))

dcast(raw.input, site ~ sp, value.var="x-sect area", fun.aggregate = sum) |> 
  as.matrix(, rownames=1)
#>         sp1 sp2 sp3
#> hilltop  15  10   0
#> stream    4   0  95

Created on 2022-07-27 by the reprex package (v2.0.1)代表 package (v2.0.1) 于 2022 年 7 月 27 日创建

Partial answer - function aggregate() performs the required location- and species-level summing of cross-sectional area.部分答案 - function aggregate()执行所需的位置和物种水平的横截面积总和。

> aggregate(raw.input$`x-sect area`,list(raw.input$site,raw.input$sp),FUN=sum)
  Group.1 Group.2  x
1 hilltop     sp1 15
2  stream     sp1  4
3 hilltop     sp2 10
4  stream     sp3 95

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R矩阵到rownames colnames值 - R matrix to rownames colnames values 使用列名和行名在R中用列值填充矩阵 - Fill matrix with column values in R using colnames and rownames 如何将矩阵的行名和列名与 R 中数据框中的列相匹配? - How to match rownames and colnames of a matrix to a column in a dataframe in R? R中的model.matrix的data.table的Rownames - Rownames for data.table in R for model.matrix 如何对R data.table中的列列进行操作以输出另一个列表列? - How to do operations on list columns in an R data.table to output another list column? 如何使用两个或更多列中的数据与R data.table的比较来应用函数 - How do I apply a function using comparisons of data in two or more columns with R data.table 如何创建 data.table 的列,即 function 的 output 输入多列 Z20339B13B20F37E - How can I create a column of data.table that is the output of a function with input multiple columns of the data.table R - 如何基于多个因素在不同的 data.table 列上运行平均值和最大值并返回原始列名 - R - How to run average & max on different data.table columns based on multiple factors & return original colnames 使用R中的data.table,如何有效替换单个列中命名的多个列值? - With data.table in R, how do I efficiently replace multiple column values named within a single column? 如何使用查找表替换data.table列中的值? [R] - How do I replace values in a data.table's column using a look up table? [R]
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM