[英]snowflake+Jupyter Notebook+spark connection
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext
import os
spark = SparkSession.builder.master('local').appName('test').config('spark.driver.memory', '5G').getOrCreate()
sfOptions=credentials
df = spark.read.format(SNOWFLAKE_SOURCE_NAME).options(**).option("query", "select * from xyz.xpy where year(ORDERDATE)=2018 limit 100").load()
# verify
df.count()
# then i am getting error of # 然后我得到了错误
Py4JJavaError: An error occurred while calling o190.load.
: java.lang.ClassNotFoundException: Failed to find data source: net.snowflake.spark.snowflake. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: java.lang.ClassNotFoundException: net.snowflake.spark.snowflake.DefaultSource
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:436)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)*
Try like this像这样尝试
Step 1: Create cluster with Spark version - 2.3.0.第 1 步:使用 Spark 版本 - 2.3.0 创建集群。 and Scala Version - 2.11和 Scala 版本 - 2.11
Step 2: Attached snowflake-jdbc-3.5.4.jar to the cluster.第 2 步:将 snowflake-jdbc-3.5.4.jar 附加到集群。 https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc/3.5.4 https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc/3.5.4
Step 3: Attached spark-snowflake_2.11-2.3.2 driver to the cluster.第 3 步:将 spark-snowflake_2.11-2.3.2 驱动程序附加到集群。 https://mvnrepository.com/artifact/net.snowflake/spark-snowflake_2.11/2.3.2 https://mvnrepository.com/artifact/net.snowflake/spark-snowflake_2.11/2.3.2
Here is the sample code.这是示例代码。
val SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
val sfOptions = Map(
"sfURL" -> "<snowflake_url>",
"sfAccount" -> "<your account name>",
"sfUser" -> "<your account user>",
"sfPassword" -> "<your account pwd>",
"sfDatabase" -> "<your database name>",
"sfSchema" -> "<your schema name>",
"sfWarehouse" -> "<your warehouse name>",
"sfRole" -> "<your account role>",
"region_id"-> "<your region name, if you are out of us region>"
)
val df: DataFrame = sqlContext.read
.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("dbtable", "<your table>")
.load()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.