[英]Writing a DataFrame to a Cassandra table in Java
Not finding exactly what I need here.在这里找不到我需要的东西。 Loads of code in scala and Python.
Scala 和 Python 中的大量代码。 Here is what I have:
这是我所拥有的:
import org.apache.log4j.Logger;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public class CassandraWriter {
private transient Logger logger = Logger.getLogger(CassandraWriter.class);
private Dataset<Row> hdfsDF;
public CassandraWriter(Dataset<Row> dataFrame) {
hdfsDF = dataFrame;
}
public void writeToCassandra(String tableName, String keyspace) {
logger.info("Writing DataFrame to table: " + tableName);
hdfsDF.write().format("org.apache.spark.sql.cassandra").mode("overwrite")
.option("table",tableName)
.option("keyspace",keyspace)
.save();
logger.info("Inserted DataFrame to Cassandra successfully");
}
}
Error I am getting when running is:运行时遇到的错误是:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark.apache.org/third-party-projects.html
Any idea?任何的想法?
You need to make sure that Spark Cassandra Connector is included into resulting jar that you're submitting.您需要确保 Spark Cassandra 连接器包含在您提交的结果 jar 中。
This either could be done via build so-called fat-jar, and submit it.这可以通过构建所谓的 fat-jar 来完成,然后提交。 For example here is example ( full pom is here ):
例如,这里是示例( 完整的 pom 在这里):
...
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<scala.version>2.11.12</scala.version>
<spark.version>2.4.4</spark.version>
<spark.scala.version>2.11</spark.scala.version>
<scc.version>2.4.1</scc.version>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_${spark.scala.version}</artifactId>
<version>${scc.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${spark.scala.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
...
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.2.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
Or you can specify spark cassandra connector as package via --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
或者您可以通过
--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2
将 spark cassandra 连接器指定为包
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.