简体   繁体   English

将函数应用于矩阵或数据框的每一行

[英]Apply a function to every row of a matrix or a data frame

Suppose I have an by 2 matrix and a function that takes a 2-vector as one of its arguments.假设我有一个乘以 2 的矩阵和一个将 2 向量作为其参数之一的函数。 I would like to apply the function to each row of the matrix and get a n-vector.我想将该函数应用于矩阵的每一行并获得一个 n 向量。 How to do this in R?如何在 R 中做到这一点?

For example, I would like to compute the density of a 2D standard Normal distribution on three points:例如,我想计算三个点上的 2D 标准正态分布的密度:

bivariate.density(x = c(0, 0), mu = c(0, 0), sigma = c(1, 1), rho = 0){
    exp(-1/(2*(1-rho^2))*(x[1]^2/sigma[1]^2+x[2]^2/sigma[2]^2-2*rho*x[1]*x[2]/(sigma[1]*sigma[2]))) * 1/(2*pi*sigma[1]*sigma[2]*sqrt(1-rho^2))
}

out <- rbind(c(1, 2), c(3, 4), c(5, 6))

How to apply the function to each row of out ?如何将函数应用于out每一行?

How to pass values for the other arguments besides the points to the function in the way you specify?除了以您指定的方式指向函数之外,如何将其他参数的值传递给函数?

You simply use the apply() function:您只需使用apply()函数:

R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1]  4 10 16
R> 

This takes a matrix and applies a (silly) function to each row.这需要一个矩阵并对每一行应用一个(愚蠢的)函数。 You pass extra arguments to the function as fourth, fifth, ... arguments to apply() .您将额外的参数作为第四个、第五个、...参数传递给apply()函数。

Here is a short example of applying a function to each row of a matrix.这是将函数应用于矩阵的每一行的简短示例。 (Here, the function applied normalizes every row to 1.) (这里,应用的函数将每一行归一化为 1。)

Note: The result from the apply() had to be transposed using t() to get the same layout as the input matrix A .注意: apply()的结果必须使用t()进行转置,以获得与输入矩阵A相同的布局。

A <- matrix(c(
  0, 1, 1, 2,
  0, 0, 1, 3,
  0, 0, 1, 3
), nrow = 3, byrow = TRUE)

t(apply(A, 1, function(x) x / sum(x) ))

Result:结果:

     [,1] [,2] [,3] [,4]
[1,]    0 0.25 0.25 0.50
[2,]    0 0.00 0.25 0.75
[3,]    0 0.00 0.25 0.75

In case you want to apply common functions such as sum or mean, you should use rowSums or rowMeans since they're faster than apply(data, 1, sum) approach.如果您想应用 sum 或 mean 等常用函数,您应该使用rowSumsrowMeans因为它们比apply(data, 1, sum)方法更快。 Otherwise, stick with apply(data, 1, fun) .否则,坚持apply(data, 1, fun) You can pass additional arguments after FUN argument (as Dirk already suggested):您可以在 FUN 参数之后传递其他参数(正如 Dirk 已经建议的那样):

set.seed(1)
m <- matrix(round(runif(20, 1, 5)), ncol=4)
diag(m) <- NA
m
     [,1] [,2] [,3] [,4]
[1,]   NA    5    2    3
[2,]    2   NA    2    4
[3,]    3    4   NA    5
[4,]    5    4    3   NA
[5,]    2    1    4    4

Then you can do something like this:然后你可以做这样的事情:

apply(m, 1, quantile, probs=c(.25,.5, .75), na.rm=TRUE)
    [,1] [,2] [,3] [,4] [,5]
25%  2.5    2  3.5  3.5 1.75
50%  3.0    2  4.0  4.0 3.00
75%  4.0    3  4.5  4.5 4.00

First step would be making the function object, then applying it.第一步是创建函数对象,然后应用它。 If you want a matrix object that has the same number of rows, you can predefine it and use the object[] form as illustrated (otherwise the returned value will be simplified to a vector):如果您想要一个具有相同行数的矩阵对象,您可以预先定义它并使用如图所示的 object[] 形式(否则返回值将被简化为向量):

bvnormdens <- function(x=c(0,0),mu=c(0,0), sigma=c(1,1), rho=0){
     exp(-1/(2*(1-rho^2))*(x[1]^2/sigma[1]^2+
                           x[2]^2/sigma[2]^2-
                           2*rho*x[1]*x[2]/(sigma[1]*sigma[2]))) * 
     1/(2*pi*sigma[1]*sigma[2]*sqrt(1-rho^2))
     }
 out=rbind(c(1,2),c(3,4),c(5,6));

 bvout<-matrix(NA, ncol=1, nrow=3)
 bvout[] <-apply(out, 1, bvnormdens)
 bvout
             [,1]
[1,] 1.306423e-02
[2,] 5.931153e-07
[3,] 9.033134e-15

If you wanted to use other than your default parameters then the call should include named arguments after the function:如果您想使用默认参数以外的参数,则调用应在函数后包含命名参数:

bvout[] <-apply(out, 1, FUN=bvnormdens, mu=c(-1,1), rho=0.6)

apply() can also be used on higher dimensional arrays and the MARGIN argument can be a vector as well as a single integer. apply() 也可用于高维数组,并且 MARGIN 参数可以是向量也可以是单个整数。

Apply does the job well, but is quite slow. Apply 可以很好地完成工作,但速度很慢。 Using sapply and vapply could be useful.使用 sapply 和 vapply 可能很有用。 dplyr's rowwise could also be useful Let's see an example of how to do row wise product of any data frame. dplyr 的 rowwise 也很有用 让我们看一个如何对任何数据框进行行式乘积的示例。

a = data.frame(t(iris[1:10,1:3]))
vapply(a, prod, 0)
sapply(a, prod)

Note that assigning to variable before using vapply/sapply/ apply is good practice as it reduces time a lot.请注意,在使用 vapply/sapply/apply 之前分配给变量是一种很好的做法,因为它可以大大减少时间。 Let's see microbenchmark results让我们看看微基准测试结果

a = data.frame(t(iris[1:10,1:3]))
b = iris[1:10,1:3]
microbenchmark::microbenchmark(
    apply(b, 1 , prod),
    vapply(a, prod, 0),
    sapply(a, prod) , 
    apply(iris[1:10,1:3], 1 , prod),
    vapply(data.frame(t(iris[1:10,1:3])), prod, 0),
    sapply(data.frame(t(iris[1:10,1:3])), prod) ,
    b %>%  rowwise() %>%
        summarise(p = prod(Sepal.Length,Sepal.Width,Petal.Length))
)

Have a careful look at how t() is being used仔细看看 t() 是如何使用的

Another approach if you want to use a varying portion of the dataset instead of a single value is to use rollapply(data, width, FUN, ...) .如果您想使用数据集的不同部分而不是单个值,另一种方法是使用rollapply(data, width, FUN, ...) Using a vector of widths allows you to apply a function on a varying window of the dataset.使用宽度向量允许您在数据集的不同窗口上应用函数。 I've used this to build an adaptive filtering routine, though it isn't very efficient.我已经用它来构建一个自适应过滤例程,尽管它不是很有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM