繁体   English   中英

用 Java 将 DataFrame 写入 Cassandra 表

[英]Writing a DataFrame to a Cassandra table in Java

在这里找不到我需要的东西。 Scala 和 Python 中的大量代码。 这是我所拥有的:

import org.apache.log4j.Logger;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

public class CassandraWriter {
    private transient Logger logger = Logger.getLogger(CassandraWriter.class);
    private Dataset<Row> hdfsDF;

    public CassandraWriter(Dataset<Row> dataFrame) {
        hdfsDF = dataFrame;
    }

    public void writeToCassandra(String tableName, String keyspace) {
        logger.info("Writing DataFrame to table: " + tableName);

        hdfsDF.write().format("org.apache.spark.sql.cassandra").mode("overwrite")
                .option("table",tableName)
                .option("keyspace",keyspace)
                .save();

        logger.info("Inserted DataFrame to Cassandra successfully");
    }
}

运行时遇到的错误是:

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark.apache.org/third-party-projects.html

任何的想法?

您需要确保 Spark Cassandra 连接器包含在您提交的结果 jar 中。

这可以通过构建所谓的 fat-jar 来完成,然后提交。 例如,这里是示例( 完整的 pom 在这里):

...
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <scala.version>2.11.12</scala.version>
    <spark.version>2.4.4</spark.version>
    <spark.scala.version>2.11</spark.scala.version>
    <scc.version>2.4.1</scc.version>
    <java.version>1.8</java.version>
  </properties>

  <dependencies>
     <dependency>
       <groupId>com.datastax.spark</groupId>
       <artifactId>spark-cassandra-connector_${spark.scala.version}</artifactId>
       <version>${scc.version}</version>
     </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_${spark.scala.version}</artifactId>
      <version>${spark.version}</version>
      <scope>provided</scope>
    </dependency>
  </dependencies>
...
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.2.0</version>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

或者您可以通过--packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.2将 spark cassandra 连接器指定为包

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM