如何使用 pySpark 中的 JDBC 读取 Cassandra 数据？

Question

In order to parallelize the read operation and read with more than one executor.为了使读取操作并行化，并与多个执行器一起读取。 Rather than the following read code, I want to read with JDBC.而不是下面的阅读代码，我想用 JDBC 阅读。

hosts ={"spark.cassandra.connection.host":'node1_ip,node2_ip,node3_ip',
   "table":"ex_table","keyspace":"ex_keyspace"}
data_frame=sqlContext.read.format("org.apache.spark.sql.cassandra") \
  .options(**hosts).load()

How can I read Cassandra data using JDBC from pySpark?如何使用 pySpark 中的 JDBC 读取 Cassandra 数据？

Answer 1

DataStax provides a JDBC driver for Apache Spark which allows you to connect to Cassandra from Spark using a JDBC connection. DataStax 为 Apache Spark 提供了 JDBC 驱动程序，允许您使用 Z8243643677 连接从 Spark 连接到 Cassandra。

The JDBC driver is available to download from the DataStax Downloads site. JDBC 驱动程序可从DataStax 下载站点下载。

See the instructions for Installing the Simba JDBC driver .请参阅安装 Simba JDBC 驱动程序的说明。 Additionally, there is also a User Guide for configuring the driver with some examples.此外，还有一个用户指南，用于通过一些示例配置驱动程序。 Cheers!干杯!

如何使用 pySpark 中的 JDBC 读取 Cassandra 数据？

问题描述

1 个解决方案

解决方案1
0 2022-09-12 12:47:01

如何使用 pySpark 中的 JDBC 读取 Cassandra 数据？

问题描述

1 个解决方案

解决方案1 0 2022-09-12 12:47:01

解决方案1
0 2022-09-12 12:47:01