简体   繁体   English

特定数据框列上的 R Apply() 函数

[英]R Apply() function on specific dataframe columns

I want to use the apply function on a dataframe, but only apply the function to the last 5 columns.我想在数据框上使用 apply 函数,但只将该函数应用于最后 5 列。

B<- by(wifi,(wifi$Room),FUN=function(y){apply(y, 2, A)})

lapply is probably a better choice than apply here, as apply first coerces your data.frame to an array which means all the columns must have the same type. lapply可能是比apply更好的选择,因为 apply 首先将 data.frame 强制转换为数组,这意味着所有列必须具有相同的类型。 Depending on your context, this could have unintended consequences.根据您的上下文,这可能会产生意想不到的后果。

The pattern is:图案是:

df[cols] <- lapply(df[cols], FUN)

The 'cols' vector can be variable names or indices. 'cols' 向量可以是变量名称或索引。 I prefer to use names whenever possible (it's robust to column reordering).我更喜欢尽可能使用名称(它对列重新排序很有效)。 So in your case this might be:所以在你的情况下,这可能是:

wifi[4:9] <- lapply(wifi[4:9], A)

An example of using column names:使用列名的示例:

wifi <- data.frame(A=1:4, B=runif(4), C=5:8)
wifi[c("B", "C")] <- lapply(wifi[c("B", "C")], function(x) -1 * x)

Using an example data.frame and example function (just +1 to all values)使用示例 data.frame 和示例函数(对所有值仅 +1)

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))
wifi

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  1  1  1  1  1  1
#2  2  2  2  2  2  2  2  2  2
#3  3  3  3  3  3  3  3  3  3
#4  4  4  4  4  4  4  4  4  4

data.frame(wifi[1:3], apply(wifi[4:9],2, A) )
#or
cbind(wifi[1:3], apply(wifi[4:9],2, A) )

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Or even:甚至:

data.frame(wifi[1:3], lapply(wifi[4:9], A) )
#or
cbind(wifi[1:3], lapply(wifi[4:9], A) )

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

As mentioned, you simply want the standard R apply function applied to columns ( MARGIN=2 ):如前所述,您只需要将标准 R apply函数应用于列 ( MARGIN=2 ):

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A)

Or, for short:或者,简称:

wifi[,4:9] <- apply(wifi[,4:9], 2, A)

This updates columns 4:9 in-place using the A() function.这将使用A()函数就地更新列 4:9。 Now, let's assume that na.rm is an argument to A() , which it probably should be.现在,让我们假设na.rmA()的参数,它可能应该是。 We can pass na.rm=T to remove NA values from the computation like so:我们可以通过na.rm=T从计算中删除 NA 值,如下所示:

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A, na.rm=T)

The same is true for any other arguments you want to pass to your custom function.对于要传递给自定义函数的任何其他参数也是如此。

The easiest way is to use the mutate function:最简单的方法是使用 mutate 函数:

dataFunctionUsed <- data %>% 
  mutate(columnToUseFunctionOn = function(oldColumn ...))

This task is easily achieved with the dplyr<\/code> package's across<\/code> functionality.使用dplyr<\/code>包的across<\/code>功能可以轻松完成此任务。

Borrowing the data structure suggested by thelatemail<\/a> :借用thelatemail 建议<\/a>的数据结构:

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))

I think what you want is mapply.我认为你想要的是mapply 。 You could apply the function to all columns, and then just drop the columns you don't want.您可以将该函数应用于所有列,然后删除您不想要的列。 However, if you are applying different functions to different columns, it seems likely what you want is mutate , from the dplyr package.但是,如果您将不同的函数应用于不同的列,那么您可能想要的是 dplyr 包中的mutate

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM