[英]Reshape ffdf dataframe in R
I am using dcast
function to rshape datframe in R
, but while using large dataframe. 我正在使用dcast
函数在R
rshape datframe,但同时使用大数据框。 I converted that into ffdf dataframe
unable to use dcast
function please help me if any alternatives. 我将其转换为无法使用dcast
功能的ffdf dataframe
,如果有其他选择,请帮助我。 Find the below example i used for small dataframe and what i want to do for ffdf dataframe
: 查找下面的示例,该示例用于小型数据ffdf dataframe
以及我想对ffdf dataframe
执行的ffdf dataframe
:
- hdsample <- read.csv("C:/Users/PK5016573/Desktop/hdsample.csv")
- View(hdsample)
hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)
This is working but: 这是可行的,但是:
hhp<-read.ffdf("C:/Users/PK5016573/Desktop/hdsample.csv")
hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)
This gives me error please help 这给我错误请帮助
thanks in advance pavan kancharala 在此先感谢Pavan kancharala
I got answer for this question but it may not work largely factored data 我得到了这个问题的答案,但它可能无法在很大程度上分解数据
# Reshape_function to process on data
# Reshaping data as per year and Primary condition group
library(reshape2)
library(ffbase)
reshapefunction<-function(x){
df=dcast(x,MemberID~ Year+PrimaryConditionGroup,
value.var= "rep.x..each...2668990.",
fun.aggregate = sum)
}
# Reshaping data using reshape_function
# Specifying size of chunks to process the data
PrimaryConditionGroup<-ffdfdply(x=hhp,split=hhp$MemberID
,FUN = function(x) reshapefunction(x),BATCHBYTES = 100000000,trace=TRUE)
View(PrimaryConditionGroup)
All the data was taken from kaggle competition added one more column "rep.x..each...2668990." 所有数据均来自kaggle竞赛,又增加了一列“ rep.x..each ... 2668990”。 which contains 1 in every row used for aggregation purpose 每行包含1个用于聚合目的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.