简体   繁体   English

在列表中数据框的特定列上执行rollapply函数

[英]rollapply function on specific column of dataframes within list

I must admit to complete lunacy when trying to understand how functions within functions are defined and passed in R. The examples always presume you understand every nuance and don't provide descriptions of the process. 在尝试理解如何在R中定义和传递函数中的函数时,我必须承认必须做到完全荒谬。这些示例始终假定您理解每个细微差别,并且不提供过程描述。 I have yet to come across a plain English, idiots guide break down of the process. 我还没有遇到过简单的英语,白痴指南的分解过程。 So the first question is do you know of one? 所以第一个问题是您知道一个吗?

Now my physical problem. 现在我的身体问题。
I have a list of data.frames: fileData. 我有一个data.frames列表:fileData。
I want to use the rollapply() function on specific columns in each data.frame. 我想在每个data.frame中的特定列上使用rollapply()函数。 I then want all the results(lists) combined. 然后,我希望所有结果(列表)结合起来。 So starting with one of the data.frames using the built in mtcars dataframes as an example: 因此,以内置的mtcars数据帧为例,从data.frames之一开始:

Of course I need to tell rollapply() to use the function PPI() along with the associated parameters which are the columns. 当然,我需要告诉rollapply()使用函数PPI()以及作为列的关联参数。

PPI <- function(a, b){  
    value = (a + b)  
    PPI = sum(value)  
    return(PPI)  
}

I tried this: 我尝试了这个:

f <- function(x) PPI(x$mpg, x$disp)
fileData<- list(mtcars, mtcars, mtcars)
df <- fileData[[1]]

and got stopped at 停在

rollapply(df, 20, f)
Error in x$mpg : $ operator is invalid for atomic vectors  

I think this is related to Zoo using matrices but other numerous attempts couldn't resolve the rollapply issue. 我认为这与使用矩阵的Zoo有关,但是其他许多尝试都无法解决rollapply问题。 So moving onto what I believe is next: 因此,接下来我相信是什么:

lapply(fileData, function(x) rollapply ......

Seems a mile away. 似乎在一英里远。 Some guidance and solutions would be very welcome. 一些指导和解决方案将非常受欢迎。
Thanks. 谢谢。

I will Try to help you and show how you can debug the problem. 我将尽力帮助您,并展示如何调试问题。 One trick that is very helpful in R is to learn how to debug. 在R中非常有用的一个技巧是学习如何调试。 Gnerelly I am using browser function. Gnerelly我正在使用browser功能。

problem : 问题:

Here I am changing you function f by adding one line : 在这里,我通过添加一行来更改函数f

f <- function(x) {
  browser()
  PPI(x$changeFactor_A, x$changeFactor_B)
}

Now when you run : 现在,当您运行时:

rollapply(df, 1, f)

The debugger stops and you can inspect the value of the argument x: 调试器停止,您可以检查参数x的值:

Browse[1]> x
 [1,] 
1e+05 

as you see is a scalar value , so you can't apply the $ operator on it, hence you get the error: 如您所见,它是一个标量值,所以您不能在其上应用$运算符,因此会出现错误:

Error in x$changeFactor_A : $ operator is invalid for atomic vectors 

general guides 一般指南

Now I will explain how you should do this. 现在,我将解释您应该如何做。

  • Either you change your PPI function, to have a single parameter excees : so you do the subtraction outside of it (easier) 您可以更改PPI函数,使其具有单个参数excees :因此,您可以在函数外部进行减法(更轻松)
  • Or you use mapply to get a generalized solution. 或者,您可以使用mapply获得通用解决方案。 (Harder but more general and very useful) (更硬,但更通用,也非常有用)
  • Avoid using $ within functions. 避免在函数中使用$ Personally, I use it only on the R console. 就个人而言,我仅在R控制台上使用它。

complete solution: 完整的解决方案:

I assume that you data.frames(zoo objects) have changeFactor_A and changeFactor_B columns. 我假设您data.frames(动物园对象)具有changeFactor_A和changeFactor_B列。

sapply(fileData,function(dat){
  dat <- transform(dat,excess= changeFactor_A-changeFactor_B)
  rollapply(dat[,'excess'],2,sum)
}

Or More generally : 或更笼统地说:

sapply(fileData,function(dat){
  excess <- get_excess(dat,'changeFactor_A','changeFactor_B')
  rollapply(excess,2,sum)
}

Where 哪里

   get_excess <- 
     function(data,colA,colB){
          ### do whatever you want here
          ### return a vector
          excess
     }

Look at the "Usage" section of the help page to ?rollapply . 查看帮助页面的“用法”部分以进行?rollapply I'll admit that R help pages are not easy to parse, and I see how you got confused. 我承认R帮助页面不容易解析,而且我知道您感到困惑。

The problem is that rollapply can deal with ts , zoo or general numeric vectors, but only a single series . 问题是rollapply可以处理tszoo或一般numeric向量, 但只能处理一个序列 You are feeding it a function that takes two arguments, asset and benchmark . 您正在为它提供一个带有两个参数的函数, assetbenchmark Granted, your f and PPI can trivially be vectorized, but rollapply simply isn't made for that. 当然,您的fPPI可以进行简单的矢量化处理,但是rollapply并不是为此而设计的。

Solution: calculate your excess outside rollapply ( excess is easily vectorially calculated, and it does not involve any rolling calculations), and only then rollapply your function to it: 解决方案:计算外部excess rollapply (可以通过矢量轻松计算出excess ,并且不涉及任何滚动计算),然后才将您的函数对其进行rollapply

> mtcars$excess <- mtcars$mpg-mtcars$disp
> rollapply(mtcars$excess, 3, sum)
 [1]  -363.2  -460.8  -663.1  -784.8  -893.9 ...

You may possibly be interested in mapply , which vectorizes a function for multiple arguments, similarly to apply and friends, which work on single arguments. 您可能对mapply感兴趣, mapply多个参数向量化了一个函数,类似于apply和friends,它们对单个参数起作用。 However, I know of no analogue of mapply with rolling windows. 但是,我知道没有类似的带有滚动窗口的mapply

I sweated away and took some time to slowly understand how to break down the process and protocol of calling a function with arguments from another function. 我大汗淋漓,花了一些时间来慢慢了解如何分解使用另一个函数的参数调用函数的过程和协议。 A great site that helped was Advanced R from the one and only Hadley Wickham, again! 一个很棒的站点再次帮助了Hadley Wickham的Advanced R The pictures showing the process breakdown are near ideal. 显示过程故障的图片非常理想。 Although I still needed my thinking cap on for a few details. 尽管我仍然需要一些细节上的思考。

Here is a complete example with notes. 这是带有注释的完整示例。 Hopefully someone else finds it useful. 希望其他人发现它有用。

library(zoo)

#Create a list of dataframes for the example.
listOfDataFrames<- list(mtcars, mtcars, mtcars)
#Give each element a name.
names(listOfDataFrames) <- c("A", "B", "C")

#This is a simple function just for the example!
#I want to perform this function on column 'col' of matrix 'm'.
#Of course to make the whole task worthwhile, this function is usually something more complex.
fApplyFunction <- function(m,col){
    mean(m[,col])
}

#This function is called from lapply() and does 'something' to the dataframe that is passed.
#I created this function to keep lapply() very simply.
#The something is to apply the function fApplyFunction(), wich requires an argument 'thisCol'. 
fOnEachElement <- function(thisDF, thisCol){
    #Convert to matrix for zoo library.
    thisMatrix <- as.matrix(thisDF)
    rollapply(thisMatrix, 5, fApplyFunction, thisCol, partial = FALSE, by.column = FALSE)
}

#This is where the program really starts!
#
#Apply a function to each element of list.
#The list is 'fileData', with each element being a dataframe.
#The function to apply to each element is 'fOnEachElement'
#The additional argument for 'fOnEachElement' is "vs", which is the name of the column I want the function performed on.
#lapply() returns each result as an element of a list.
listResults <- lapply(listOfDataFrames, fOnEachElement, "vs")


#Combine all elements of the list into one dataframe.
combinedResults <- do.call(cbind, listResults)

#Now that I understand the argument passing, I could call rollapply() directly from lapply()...
#Note that ONLY the additional arguments of rollapply() are passed. The primary argurment is passed automatically by lapply().
listResults2 <- lapply(listOfDataFrames, rollapply, 5, fApplyFunction, "vs", partial = FALSE, by.column = FALSE)

Results: 结果:

> combinedResults
        A   B   C
 [1,] 0.4 0.4 0.4
 [2,] 0.6 0.6 0.6
 [3,] 0.6 0.6 0.6
 [4,] 0.6 0.6 0.6
 [5,] 0.6 0.6 0.6
 [6,] 0.8 0.8 0.8
 [7,] 0.8 0.8 0.8
 [8,] 0.8 0.8 0.8
 [9,] 0.6 0.6 0.6
[10,] 0.4 0.4 0.4
[11,] 0.2 0.2 0.2
[12,] 0.0 0.0 0.0
[13,] 0.0 0.0 0.0
[14,] 0.2 0.2 0.2
[15,] 0.4 0.4 0.4
[16,] 0.6 0.6 0.6
[17,] 0.8 0.8 0.8
[18,] 0.8 0.8 0.8
[19,] 0.6 0.6 0.6
[20,] 0.4 0.4 0.4
[21,] 0.2 0.2 0.2
[22,] 0.2 0.2 0.2
[23,] 0.2 0.2 0.2
[24,] 0.4 0.4 0.4
[25,] 0.4 0.4 0.4
[26,] 0.4 0.4 0.4
[27,] 0.2 0.2 0.2
[28,] 0.4 0.4 0.4
> listResults
$A
 [1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4

$B
 [1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4

$C
 [1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4

> listResults2
$A
 [1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4

$B
 [1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4

$C
 [1] 0.4 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 0.8 0.6
[20] 0.4 0.2 0.2 0.2 0.4 0.4 0.4 0.2 0.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM