[英]Finding unique rows in arma::mat
In R we can use unique method to find unique rows 在R中,我们可以使用唯一方法来查找唯一行
> data <- matrix(c(1,1,0,1,1,1,0,1),ncol = 2)
> data
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 0 0
[4,] 1 1
> unique(data)
[,1] [,2]
[1,] 1 1
[2,] 0 0
How can we do it for arma::mat
in Rcpp? 如何在Rcpp中为
arma::mat
做到这一点? Here unique function returns unique elements not unique rows. 在此,唯一函数返回唯一元素而不是唯一行。
I don't think there is a built-in way to do this in the Armadillo library, but here is a simple approach: 我认为Armadillo库中没有内置的方法可以执行此操作,但是这里有一个简单的方法:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
template <typename T>
inline bool rows_equal(const T& lhs, const T& rhs, double tol = 0.00000001) {
return arma::approx_equal(lhs, rhs, "absdiff", tol);
}
// [[Rcpp::export]]
arma::mat unique_rows(const arma::mat& x) {
unsigned int count = 1, i = 1, j = 1, nr = x.n_rows, nc = x.n_cols;
arma::mat result(nr, nc);
result.row(0) = x.row(0);
for ( ; i < nr; i++) {
bool matched = false;
if (rows_equal(x.row(i), result.row(0))) continue;
for (j = i + 1; j < nr; j++) {
if (rows_equal(x.row(i), x.row(j))) {
matched = true;
break;
}
}
if (!matched) result.row(count++) = x.row(i);
}
return result.rows(0, count - 1);
}
/*** R
data <- matrix(c(1,1,0,1,1,1,0,1), ncol = 2)
all.equal(unique(data), unique_rows(data))
#[1] TRUE
data2 <- matrix(1:9, nrow = 3)
all.equal(unique(data2), unique_rows(data2))
#[1] TRUE
data3 <- matrix(0, nrow = 3, ncol = 3)
all.equal(unique(data3), unique_rows(data3))
#[1] TRUE
data4 <- matrix(c(0, 0, 0, 1, 1, 0, 1, 1), ncol = 2)
all.equal(unique(data4), unique_rows(data4))
#[1] TRUE
*/
As suggested by mtall in the comments, rows_equal
is using arma::approx_equal
to test for equality, rather than operator==
, to avoid some of the comparison issues inherent to floating point numbers. 如mtall在评论中所建议,
rows_equal
使用arma::approx_equal
来测试是否相等,而不是operator==
,以避免浮点数固有的一些比较问题。 The options used in this function were chosen somewhat arbitrarily and can of course be changed as needed; 此功能中使用的选项是任意选择的,当然可以根据需要进行更改。 but the value of
tol
is roughly equal to the default tolerance used by R's all.equal
, which is .Machine$double.eps^0.5
(~ 0.00000001490116
on my machine). 但是值
tol
大致等于由R的使用的默认公差all.equal
,这是.Machine$double.eps^0.5
(〜 0.00000001490116
我的机器上)。
Same approach inspired by @nrussell, slightly shorter: 受@nrussell启发的相同方法,略短一些:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
template <typename T>
inline bool approx_equal_cpp(const T& lhs, const T& rhs, double tol = 0.00000001) {
return arma::approx_equal(lhs, rhs, "absdiff", tol);
}
// [[Rcpp::export]]
arma::mat unique_rows(const arma::mat& m) {
arma::uvec ulmt = arma::zeros<arma::uvec>(m.n_rows);
for (arma::uword i = 0; i < m.n_rows; i++) {
for (arma::uword j = i + 1; j < m.n_rows; j++) {
if (approx_equal_cpp(m.row(i), m.row(j))) { ulmt(j) = 1; break; }
}
}
return m.rows(find(ulmt == 0));
}
// [[Rcpp::export]]
arma::mat unique_cols(const arma::mat& m) {
arma::uvec vlmt = arma::zeros<arma::uvec>(m.n_cols);
for (arma::uword i = 0; i < m.n_cols; i++) {
for (arma::uword j = i + 1; j < m.n_cols; j++) {
if (approx_equal_cpp(m.col(i), m.col(j))) { vlmt(j) = 1; break; }
}
}
return m.cols(find(vlmt == 0));
}
/*** R
data <- matrix(c(1,1,0,1,1,1,0,1), ncol = 2)
all.equal(unique(data), unique_rows(data))
#[1] TRUE
data2 <- matrix(1:9, nrow = 3)
all.equal(unique(data2), unique_rows(data2))
#[1] TRUE
data3 <- matrix(0, nrow = 3, ncol = 3)
all.equal(unique(data3), unique_rows(data3))
#[1] TRUE
data4 <- matrix(c(0, 0, 0, 1, 1, 0, 1, 1), ncol = 2)
all.equal(unique(data4), unique_rows(data4))
#[1] TRUE
*/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.