简体   繁体   中英

Running Prophet with SparkDataFrame in R in Databricks

I have a Spark Dataframe in R. I retrieved the dataset from snowflake by doing the following:

snowflake_tbl_name <- "xxxxxx"
                          
tickerDF <- SparkR::read.df(  
  source = "snowflake",
  sfUrl = "xxxxxx.snowflakecomputing.com",
  sfUser = "xxxxxxx",
  sfPassword = "xxxxxxxxxx",
  sfDatabase = "xxxxxxxxx",
  sfSchema = "PUBLIC",
  sfWarehouse = "COMPUTE_WH",
  dbtable = snowflake_tbl_name)

The dataset looks something like this:

'SparkDataFrame': 4 variables:
 $ ds      : Date 2022-01-05 2022-01-06 2022-01-07 2022-01-10 2022-01-11 2022-01-12
 $ TICKER  : chr "HDEF" "HDEF" "HDEF" "HDEF" "HDEF" "HDEF"
 $ y       : num 23.870001 23.9 24.200001 24.200001 24.450001 24.6
 $ RUN_DATE: Date 2022-02-26 2022-02-26 2022-02-26 2022-02-26 2022-02-26 2022-02-26

          ds TICKER     y   RUN_DATE
1 2022-01-05   HDEF 23.87 2022-02-26
2 2022-01-06   HDEF 23.90 2022-02-26
3 2022-01-07   HDEF 24.20 2022-02-26
4 2022-01-10   HDEF 24.20 2022-02-26
5 2022-01-11   HDEF 24.45 2022-02-26
6 2022-01-12   HDEF 24.60 2022-02-26

I now want to use the prophet package to predict future values for y.

When I try and simply run the following, I get an error stating:

library(prophet)
library(dplyr)
m <- prophet::prophet(spark_df)


Error in as.environment(where) : 
  S4 object does not extend class "environment"
Some(<code style = 'font-size:10p'> Error in as.environment(where): S4 object does not extend 
class &quot;environment&quot; </code>)
Error in as.environment(where): S4 object does not extend class "environment"

Any idea why this may be the case?

There doesn't seem to be anything wrong the the code. I'm not familiar with spark data frames but prophet requires two columns ds and y, so you could just put them into a regular two column dataframe and see if that helps.

I have had issues trying to run prophet models with outdated versions of rstan and Rcpp, you may want to check your dependencies.

To actually forecast you need to use the model with make_future_dataframe

make_future_dataframe(m, periods, freq = "day", include_history = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM