[英]Group by all columns in a data.table
I'm working with iris
data.table in R.我在 R 中使用iris
data.table。
To remind how it looks I paste six five rows here为了提醒它的外观,我在这里粘贴了六五行
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 5.1 3.5 1.4 0.2 setosa
2: 4.9 3.0 1.4 0.2 setosa
3: 4.7 3.2 1.3 0.2 setosa
4: 4.6 3.1 1.5 0.2 setosa
5: 5.0 3.6 1.4 0.2 setosa
6: 5.4 3.9 1.7 0.4 setosa
I would like to calculate the number of rows, grouped by all columns.我想计算按所有列分组的行数。 Of course we may write all variables in by
, like this:当然我们可以把所有的变量都写在by
中,像这样:
iris[, .(Freq = .N), by = .(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species)]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Freq
1: 5.1 3.5 1.4 0.2 setosa 1
2: 4.9 3.0 1.4 0.2 setosa 1
3: 4.7 3.2 1.3 0.2 setosa 1
4: 4.6 3.1 1.5 0.2 setosa 1
5: 5.0 3.6 1.4 0.2 setosa 1
6: 5.4 3.9 1.7 0.4 setosa 1
However, I wonder if there is a method to group by all variables without needing to type all the columns names?但是,我想知道是否有一种方法可以按所有变量分组而无需键入所有列名?
In case you are looking for duplicates, uniqueN
will default to using all columns:如果您正在查找重复项, uniqueN
将默认使用所有列:
uniqueN(as.data.table(iris))
# [1] 149
This doesn't answer your question directly, but it might be a more direct way of accomplishing what you were trying to do in the first place.这并不能直接回答您的问题,但它可能是一种更直接的方式来完成您最初尝试做的事情。
Similarly, if you're looking for which rows are duplicated, you can use duplicated
's data.table
method which similarly defaults to using all columns:同样,如果您要查找重复的行,则可以使用duplicated
的data.table
方法,该方法同样默认使用所有列:
iris[duplicated(iris)]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1: 5.8 2.7 5.1 1.9 virginica
Here is an approach in Base-R这是Base-R中的一种方法
Freq <- table(apply(iris,1,paste0, collapse=" "))
iris$Freq <- apply(iris,1, function(x) Freq[names(Freq) %in% paste0(x,collapse=" ")])
output: output:
> iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Freq
... ... ... ... ... ... ...
140 6.9 3.1 5.4 2.1 virginica 1
141 6.7 3.1 5.6 2.4 virginica 1
142 6.9 3.1 5.1 2.3 virginica 1
143 5.8 2.7 5.1 1.9 virginica 2
144 6.8 3.2 5.9 2.3 virginica 1
145 6.7 3.3 5.7 2.5 virginica 1
We can use我们可以用
library(data.table)
out1 <- as.data.table(iris)[, .N, by = names(iris)]
-checking with OP's approach -检查OP的方法
out2 <- as.data.table(iris)[, .N, by = .(Sepal.Length,
Sepal.Width, Petal.Length, Petal.Width, Species)]
identical(out1, out2)
#[1] TRUE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.