[英]Spark Streaming - Java - Insert JSON from Kafka into Cassandra
我正在使用Java在Spark Streaming中编写一个简单的数据管道,以从Kafka中提取JSON数据,将JSON解析为自定义类( Transaction
),然后将该数据插入到Cassandra表中,但是我无法获得mapToRow()
功能正常工作。
我已经看到了无数的示例,这些示例表明您要做的只是遵循以下步骤:
JavaPairInputDStream<String, String> stream = KafkaUtils.createDirectStream(
streamingContext,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
);
JavaDStream<String> lines = stream.map(
new Function<Tuple2<String,String>, String>(){
@Override
public String call(Tuple2<String,String> tuple2) {
return tuple2._2();
}
}
);
javaFunctions(lines).writerBuilder("myKeyspace", "myTableName", mapToRow(Transaction.class)).saveToCassandra();
但是,当我这样做时,我得到了错误:
The method mapToRow(Class<Transaction>) is undefined for the type SaveTransactions
我认为我所缺的只是课堂上的某种装饰,但我没有成功找出哪种装饰。 我尝试过无所事事,本质上是使该类成为财产包:
public class Transaction implements java.io.Serializable{
public int TransactionId;
...
public Transaction(){}
}
我已经尝试了所有的DataStax映射注释:
@Table(keyspace = "myKeyspace", name = "myTableName",
readConsistency = "QUORUM",
writeConsistency = "QUORUM",
caseSensitiveKeyspace = false,
caseSensitiveTable = false)
public class Transaction implements java.io.Serializable{
@PartitionKey(0)
@Column(name="transaction_id")
public int TransactionId;
...
public Transaction(){}
}
我还尝试为每个属性建立公共获取/设置方法,并将属性设置为私有:
public class Transaction implements java.io.Serializable{
private int transactionId;
...
public Transaction(){}
public int getTransactionId() {
return transactionId;
}
public void setTransactionId(int transactionId) {
this.transactionId = transactionId;
}
}
我已经能够使用以下类将DStream
解析为Transactions
的RDD
:
public class Transaction implements java.io.Serializable{
...
public static class ParseJSON implements FlatMapFunction<Iterator<String>, Transaction> {
public Iterable<Transaction> call(Iterator<String> lines) throws Exception {
ArrayList<Transaction> transactions = new ArrayList<Transaction>();
ObjectMapper mapper = new ObjectMapper();
while (lines.hasNext()) {
String line = lines.next();
try {
transactions.add(mapper.readValue(line, Transaction.class));
} catch (Exception e) {
System.out.println("Skipped:" + e);
}
}
return transactions;
}
}
}
结合以下代码,从上方作用于lines
对象:
JavaDStream<Transaction> events = lines.mapPartitions(new Transaction.ParseJSON());
但是,一旦有了它,它仍然无法与writeBuilder()。saveToCassandra()链一起使用。
非常感谢您的任何帮助。
原来这个问题只是一个进口问题。 我导入了com.datastax.spark.connector.japi.CassandraStreamingJavaUtil.*
以为它可以提供所需的一切,但是我还需要为.mapToRow()引入com.datastax.spark.connector.japi.CassandraJavaUtil.*
。功能。
解决此问题后,我开始出现以下错误:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/package$ScalaReflectionLock$
at org.apache.spark.sql.catalyst.ReflectionLock$.<init>(ReflectionLock.scala:5)
at org.apache.spark.sql.catalyst.ReflectionLock$.<clinit>(ReflectionLock.scala)
at com.datastax.spark.connector.mapper.ReflectionColumnMapper.<init>(ReflectionColumnMapper.scala:38)
at com.datastax.spark.connector.mapper.JavaBeanColumnMapper.<init>(JavaBeanColumnMapper.scala:10)
at com.datastax.spark.connector.util.JavaApiHelper$.javaBeanColumnMapper(JavaApiHelper.scala:93)
at com.datastax.spark.connector.util.JavaApiHelper.javaBeanColumnMapper(JavaApiHelper.scala)
at com.datastax.spark.connector.japi.CassandraJavaUtil.mapToRow(CassandraJavaUtil.java:1204)
at com.datastax.spark.connector.japi.CassandraJavaUtil.mapToRow(CassandraJavaUtil.java:1222)
at globalTransactions.Process.main(Process.java:77)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.package$ScalaReflectionLock$
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 9 more
通过引入spark-sql项目解决了该问题:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.2</version>
</dependency>
希望这对下一个家伙/女孩有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.