简体   繁体   中英

Using CRAN packages inside SparkR

If I wanted to use a standard R package like MXNet inside SparkR, is this possible ? Can standard CRAN packages be used inside the Spark distributed environment without considering a local vs a Spark Dataframe. Is the strategy in working with large data sets in R and Spark to use a Spark dataframe, whittle down the Dataframe and then convert it to a local data.frame to use the standard CRAN package ? Is there another strategy that I'm not aware of ?

Thanks

Can standard CRAN packages be used inside the Spark distributed environment without considering a local vs a Spark Dataframe.

No, they cannot.

Is the strategy in working with large data sets in R and Spark to use a Spark dataframe, whittle down the Dataframe and then convert it to a local data.frame .

Sadly, most of the time this is what you do.

Is there another strategy that I'm not aware of ?

dapply and gapply functions in Spark 2.0 can apply arbitrary R code to the partitions or groups.

For certain operations you can use the packages that use uniform syntax for local R data frames and Spark data frames. For instance if you use Sparklyr , dplyr can push your standard data wrangling operations back into the Spark cluster. You will fetch your data only when you need it for local operations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM