data.table 按组的所有 collums 的总和

Question

I have a dataframe consisting of 515 integer columns and 2 643 246 rows, from which I would like to subset an unkown number of columns and aggregate the data to a single column showing the sum, by two group-columns.我有一个 dataframe 由 515 个 integer 列和 2 643 246 行组成，我想从中对未知数量的列进行子集化，并将数据聚合到显示总和的单个列，按两个组列。

To do the first part I've used the selection-function from data.table like this, TestData[,c(Kattegori_Henter("Medicine"), "id", "year"), with = FALSE] Where Kattegori_Henter is a function returning the name of the columns I would like to select, from a different dataset.为了做第一部分，我使用了 data.table 中的选择函数，像这样TestData[,c(Kattegori_Henter("Medicine"), "id", "year"), with = FALSE]其中Kattegori_Henter是一个 function我想要 select 的列的名称，来自不同的数据集。 From this selection I then want to do the aggregation.然后我想从这个选择中进行聚合。

I have attempted a couple different solutions in data.table to perform this aggregation, without getting a result.我在 data.table 中尝试了几种不同的解决方案来执行此聚合，但没有得到结果。 Given the intro-data.table vignette I believed the solution would be to add鉴于 intro-data.table 小插图，我相信解决方案是添加

TestData[,c(Kattegori_Henter("Medicine"), "id", "year"), with = FALSE, lapply(.SD,sum, na.rm = 
         TRUE), by = c(id, year)]

However, this returns the error Provide either by= or keyby= but not both , which I do not understand the meaning of, and without google giving any good results.但是，这会返回错误Provide either by= or keyby= but not both ，我不明白它的含义，并且谷歌没有给出任何好的结果。

I then attempted:然后我尝试：

TestData[,c(Kattegori_Henter("Medicine"), "id", "year"), with = FALSE, a := sum(1.ncol), by = c(id, year)]

Which didn't result in anything at all, other than returning the subsetted dataframe.除了返回子集的 dataframe 之外，这根本没有产生任何结果。

The reasoning behind doing this is that I would like to do use lapply on the kattegori_henter function, aggregating the 525 collumns into a set of cattegories.这样做的原因是我想在kattegori_henter function 上使用 lapply，将 525 个列聚合成一组类别。

Thanks in advance for all help!提前感谢所有帮助！

Edit: Attempted编辑：尝试

   TestData[,c(Kattegori_Henter("Medicine"), "id", "year"), with =  
   FALSE][, lapply(.SD, sum, na.rm = TRUE), by = c("id", "year")]

As mentioned in the comments.如评论中所述。 The results was the same as the 2nd code above, returning an unchanged dataframe.结果与上面的第二个代码相同，返回不变的 dataframe。

Edit 2: Removed this from the question, due to a comment on it not producing the wanted results: ", which would be equal to the tidyverse-code:编辑2：从问题中删除了这个，因为对它的评论没有产生想要的结果：“，这将等于tidyverse-code：

Test2 %>% 
group_by(id, year) %>% 
summarise(a = sum(1:ncol(.), na.rm = TRUE)) "

Answer 1

I think the code you're looking for is likely:我认为您正在寻找的代码很可能是：

TestData[, .(a = sum(.SD)), by = .(id, year), .SDcols = Kattegori_Henter("Medicine")]

data.table 按组的所有 collums 的总和

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-12-15 15:15:13

data.table 按组的所有 collums 的总和

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-12-15 15:15:13

解决方案1
1 已采纳 2021-12-15 15:15:13