[英]In R how do I sum values in a data.table column aggregated by two character columns, with matrix with colnames and rownames equal to strings output?
I have a large.csv file containing the results of recent large-scale forest surveys, in which each row contains a given individual tree's location, species identity, and measured cross-sectional area.我有一个 large.csv 文件,其中包含最近大规模森林调查的结果,其中每一行都包含给定的单个树的位置、物种身份和测量的横截面积。 I read this.csv into RStudio using
fread()
to produce a data.table
.我使用
fread()
将 this.csv 读入 RStudio 以产生data.table
。 I want to collapse this large data.table
into a matrix
such that each row corresponds with a location, each column corresponds with a single species, and each cell contains the sum of all cross-sectional areas of that species at that location.我想把这个大的
data.table
成一个matrix
,使得每一行对应一个位置,每一列对应一个物种,每个单元格包含该物种在该位置的所有横截面积的总和。
Below is a dummy data.table
in the format of my data, as copied from the console.下面是从控制台复制的我的数据格式的虚拟
data.table
。 Values in cells are summed values from column x-sect area
in raw.input
.单元格中的值是
raw.input
中x-sect area
列的总和值。
> raw.input <- fread("raw_input.csv")
> raw.input
site sp x-sect area
1: hilltop sp2 10
2: hilltop sp1 3
3: hilltop sp1 5
4: hilltop sp1 4
5: hilltop sp1 3
6: stream sp3 45
7: stream sp3 50
8: stream sp1 4
Below is a matrix
in my desired format, generated as a.csv is MS Excel, read in using fread()
, and converted to a matrix
in RStudio.下面是我想要的格式的
matrix
,生成为 a.csv 是 MS Excel,使用fread()
读取,并转换为 RStudio 中的matrix
。
> mtrx.tmp <- fread("mtrx_final.csv")
> mtrx <- as.matrix(mtrx.tmp[,2:4]) #remove character strings so matrix is numeric
> row.names(mtrx) <- mtrx.tmp$site #mtrx.tmp$site is equivalent to mtrx.tmp[,1] in content
> mtrx
sp1 sp2 sp3
hilltop 15 10 0
stream 4 0 95
If a data.table
is an inappropriate/inefficient format in which to read in this data set please do include that in your answer.如果
data.table
是读取此数据集的不适当/低效格式,请务必将其包含在您的答案中。
You can use dcast
from data.table
for that (and data.table
is perfectly suited for this task):您可以为此使用
dcast
的data.table
(并且data.table
非常适合此任务):
library(data.table)
raw.input <- structure(list(site = c("hilltop", "hilltop", "hilltop", "hilltop",
"hilltop", "stream", "stream", "stream"), sp = c("sp2", "sp1",
"sp1", "sp1", "sp1", "sp3", "sp3", "sp1"), `x-sect area` = c(10L,
3L, 5L, 4L, 3L, 45L, 50L, 4L)), row.names = c(NA, -8L), class = c("data.table",
"data.frame"))
dcast(raw.input, site ~ sp, value.var="x-sect area", fun.aggregate = sum) |>
as.matrix(, rownames=1)
#> sp1 sp2 sp3
#> hilltop 15 10 0
#> stream 4 0 95
Created on 2022-07-27 by the reprex package (v2.0.1)由代表 package (v2.0.1) 于 2022 年 7 月 27 日创建
Partial answer - function aggregate()
performs the required location- and species-level summing of cross-sectional area.部分答案 - function
aggregate()
执行所需的位置和物种水平的横截面积总和。
> aggregate(raw.input$`x-sect area`,list(raw.input$site,raw.input$sp),FUN=sum)
Group.1 Group.2 x
1 hilltop sp1 15
2 stream sp1 4
3 hilltop sp2 10
4 stream sp3 95
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.