简体   繁体   English

雪花+Jupyter Notebook+spark连接

[英]snowflake+Jupyter Notebook+spark connection

i am trying to connect spark+python+snowflake for faster data processing , if possible kindly provide solution for the same我正在尝试连接 spark+python+snowflake 以加快数据处理速度,如果可能,请提供相同的解决方案

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext 
import os

spark = SparkSession.builder.master('local').appName('test').config('spark.driver.memory', '5G').getOrCreate()

sfOptions=credentials

df = spark.read.format(SNOWFLAKE_SOURCE_NAME).options(**).option("query", "select * from xyz.xpy where year(ORDERDATE)=2018 limit 100").load()

# verify 
df.count()

# then i am getting error of # 然后我得到了错误

Py4JJavaError: An error occurred while calling o190.load.
: java.lang.ClassNotFoundException: Failed to find data source: net.snowflake.spark.snowflake. Please find packages at http://spark.apache.org/third-party-projects.html
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:567)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: java.lang.ClassNotFoundException: net.snowflake.spark.snowflake.DefaultSource
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:436)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)*

Try like this像这样尝试

Step 1: Create cluster with Spark version - 2.3.0.第 1 步:使用 Spark 版本 - 2.3.0 创建集群。 and Scala Version - 2.11和 Scala 版本 - 2.11

Step 2: Attached snowflake-jdbc-3.5.4.jar to the cluster.第 2 步:将 snowflake-jdbc-3.5.4.jar 附加到集群。 https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc/3.5.4 https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc/3.5.4

Step 3: Attached spark-snowflake_2.11-2.3.2 driver to the cluster.第 3 步:将 spark-snowflake_2.11-2.3.2 驱动程序附加到集群。 https://mvnrepository.com/artifact/net.snowflake/spark-snowflake_2.11/2.3.2 https://mvnrepository.com/artifact/net.snowflake/spark-snowflake_2.11/2.3.2

Here is the sample code.这是示例代码。

val SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

val sfOptions = Map(
    "sfURL" -> "<snowflake_url>",
    "sfAccount" -> "<your account name>",
    "sfUser" -> "<your account user>",
    "sfPassword" -> "<your account pwd>",
    "sfDatabase" -> "<your database name>",
    "sfSchema" -> "<your schema name>",
    "sfWarehouse" -> "<your warehouse name>",
    "sfRole" -> "<your account role>",
    "region_id"-> "<your region name, if you are out of us region>"
)

val df: DataFrame = sqlContext.read
    .format(SNOWFLAKE_SOURCE_NAME)
    .options(sfOptions)
    .option("dbtable", "<your table>")
    .load()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM