I have a massive Spark Dataframe, called x. I am using databricks. x is billions of records long, too large to collect onto a single machine. What do I have to do to get this to work?:
dplyr::summarize_all(x,mean)
This is the error message I currently get:
Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "SparkDataFrame"
and
class(x)
returns: [1] "SparkDataFrame" attr(,"package") [1] "SparkR"
The book, Mastering Spark with R , has an example of loading up a tiny r data frame, and running summarize_all on it:
cars <- copy_to(sc, mtcars)
summarize_all(cars, mean)
Note the above code works on my databricks cluster and returns a nice block of text:
# Source: spark<?> [?? x 11]
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 20.1 6.19 231. 147. 3.60 3.22 17.8 0.438 0.406 3.69 2.81
The same book leads me to believe I can use this and similar functions on huge spark dataframes.
and also
class(cars)
returns:
[1] "tbl_spark" "tbl_sql" "tbl_lazy" "tbl"
It seems obvious that I need to convert my spark dataframe to a tbl_spark, tbl_sql, tbl_lazy or tbl so that I can pass it to dplyr::summarize_all, but I have searched all over the place and asked experts and cannot figure out how to do this.
You're right that SparkR
and sparklyr
are different APIs that don't play well together. You can convert the SparkR
data frame to be used with sparklyr
by using a temp table.
Here's an example SparkR
data frame.
sc <- sparklyr::spark_connect(method = "databricks")
x_sparkr <- SparkR::sql("SELECT 1 AS a UNION SELECT 2")
Create the temp table.
SparkR::registerTempTable(x_sparkr, "temp_x")
Load it into sparklyr
.
x_sparklyr <- dplyr::tbl(sc, "temp_x")
dplyr::summarize_all(x_sparklyr, mean)
#> # Source: spark<?> [?? x 1]
#> a
#> <dbl>
#> 1 1.5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.