在R中重塑ffdf数据帧

Question

I am using dcast function to rshape datframe in R , but while using large dataframe. 我正在使用dcast函数在R rshape datframe，但同时使用大数据框。 I converted that into ffdf dataframe unable to use dcast function please help me if any alternatives. 我将其转换为无法使用dcast功能的ffdf dataframe ，如果有其他选择，请帮助我。 Find the below example i used for small dataframe and what i want to do for ffdf dataframe : 查找下面的示例，该示例用于小型数据ffdf dataframe以及我想对ffdf dataframe执行的ffdf dataframe ：

- hdsample <- read.csv("C:/Users/PK5016573/Desktop/hdsample.csv")
- View(hdsample)


hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)

This is working but: 这是可行的，但是：

hhp<-read.ffdf("C:/Users/PK5016573/Desktop/hdsample.csv")

hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)

This gives me error please help 这给我错误请帮助

thanks in advance pavan kancharala 在此先感谢Pavan kancharala

Answer 1

I got answer for this question but it may not work largely factored data 我得到了这个问题的答案，但它可能无法在很大程度上分解数据

# Reshape_function to process on data
   # Reshaping data as per year and Primary condition group
    library(reshape2)
    library(ffbase)
    reshapefunction<-function(x){
    df=dcast(x,MemberID~ Year+PrimaryConditionGroup,
    value.var= "rep.x..each...2668990.",              
    fun.aggregate = sum)
    }
    # Reshaping data using reshape_function 
    # Specifying size of chunks to process the data
    PrimaryConditionGroup<-ffdfdply(x=hhp,split=hhp$MemberID
    ,FUN = function(x) reshapefunction(x),BATCHBYTES = 100000000,trace=TRUE)

View(PrimaryConditionGroup)

All the data was taken from kaggle competition added one more column "rep.x..each...2668990." 所有数据均来自kaggle竞赛，又增加了一列“ rep.x..each ... 2668990”。 which contains 1 in every row used for aggregation purpose 每行包含1个用于聚合目的

在R中重塑ffdf数据帧

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-12-22 10:05:41

在R中重塑ffdf数据帧

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-12-22 10:05:41

解决方案1
0 已采纳 2014-12-22 10:05:41