I already have a ODBC connection from python to SQL server, I wish to use pyspark to run queries, how can I use my current connection with pyspark.
thanks
Your question is quite broad, but here goes. You can read from a SQL database using:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = (
spark.read.format("jdbc")
.option("url", f"jdbc:{sql_flavour}://{ip}:{port};databaseName={database}")
.option("dbtable", "table_name")
.option("user", username)
.option("password", password)
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.load()
)
I suppose the important bit is to use the JDBC
format, but specify your driver
. If you run into issues with this, you might need to download specific drivers/jars. Hope this helps. Please try to include a code snippet or an example of what you tried next time.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.