![](/img/trans.png)
[英]Is checkpointing mandatory when using a Kafka sink in Spark Structured Streaming?
[英]Getting NotSerializableException - When using Spark Streaming with Kafka
我正在使用SparkStreaming来读取主题中的数据。 我正面临一个例外。
java.io.NotSerializableException:org.apache.kafka.clients.consumer.ConsumerRecord序列化堆栈: - 对象不可序列化(类:org.apache.kafka.clients.consumer.ConsumerRecord,值:ConsumerRecord(topic = rawEventTopic,partition = 0 ,offset = 14098,CreateTime = 1556113016951,序列化密钥大小= -1,序列化值大小= 2916,headers = RecordHeaders(headers = [],isReadOnly = false),key = null,value = {“id”:null,“消息 “:NULL,” EVENTDATE “:””, “基团”:空, “类别”: “AD”, “username” 的:NULL, “inboundDataSource”: “AD”, “源”: “192.168.1.14”, “destination”:“192.168.1.15”,“bytesSent”:“200KB”,“rawData”:“{username:vinit}”,“account_name”:null,“security_id”:null,“account_domain”:null,“logon_id “:空,” PROCESS_ID “:空,” PROCESS_INFORMATION “:空,” PROCESS_NAME “:空,” target_server_name “:空,” source_network_address “:空,” logon_process “:空,” authentication_Package “:空,” network_address“:空, “FAILURE_REASON”:空, “workstation_name”:空, “target_server”:空,“网络_ “:空,” OBJECT_TYPE “:空,” OBJECT_NAME “:空,” source_port “:空,” logon_type “:空,” GROUP_NAME “:空,” source_dra “:空,” destination_dra “:空,” group_admin“:空, “sam_account_name”:空, “new_logon”:空 “的destination_address”:空, “destination_port”:空, “source_address”:空, “logon_account”:空, “sub_status”:空, “EVENTDATE”:空, “TIME_TAKEN”:空, “s_computername”:空, “cs_method”:空, “cs_uri_stem”:空, “cs_uri_query”:空, “c_ip”:空, “s_ip”:空, “s_supplier_name”:空,“s_sitename “:空,” cs_username “:空,” cs_auth_group “:空,” cs_categories “:空,” s_action “:空,” cs_host “:空,” cs_uri “:空,” cs_uri_scheme “:空,” cs_uri_port“:空, “cs_uri_path”:空, “cs_uri_extension”:空, “cs_referer”:空, “cs_user_agent”:空, “cs_bytes”:空, “sc_status”:空, “sc_bytes”:空, “sc_filter_result”:空, “sc_filter_category”:空, “x_virus_id”:空, “x_exception_id”:空, “rs_content_type”:空, “s_supplier_ip”:空, “cs_cookie”:空, “s_port”:空, “cs_version”:空,“创建时间“:空,” 手术 “:空,” 工作量“:N ULL, “clientIP”:空, “用户id”:空 “的EventSource”:空, “项目类型”:空 “的userAgent”:空, “EVENTDATA”:空, “sourceFileName”:空, “SITEURL”:空, “targetUserOrGroupType”:空, “targetUserOrGroupName”:空, “sourceFileExtension”:空, “sourceRelativeUrl”:空, “resultStatus”:空, “客户端”:空, “loginStatus”:空, “USERDOMAIN”:空,“端ClientIPAddress “:空,” clientProcessName “:空,” 客户机版本 “:空,” externalAccess “:空,” 登录类型 “:空,” mailboxOwnerUPN “:空,” 单位名称 “:空,” originatingServer “:空,” 主题“:空, “sendAsUserSmtp”:空, “deviceexternalid”:空, “deviceeventcategory”:空, “devicecustomstring1”:空, “customnumber2”:空, “customnumber1”:空, “emailsender”:空, “sourceusername”:空, “sourceaddress”:空, “emailrecipient”:空, “目的地地址”:空, “destinationport”:空, “requestclientapplication”:空, “oldfilepath”:空, “文件路径”:空, “additionaldetails11”:空,“应用协议“:空,” emailrecipienttype “:空,” Emailsubject的 “:空,” transactionstring1 “:空,” deviceaction“:N ULL, “devicecustomdate2”:空, “devicecustomdate1”:空, “sourcehostname”:空, “additionaldetails10”:空, “文件名”:空, “bytesout”:空, “additionaldetails13”:空, “additionaldetails14”:空, “ACCOUNTNAME”:NULL, “destinationhostname”:NULL, “DataSourceID的”:2 “日期”: “”, “违反”:假 “oobjectId”:NULL, “eventCategoryName”: “AD”, “sourceDataType”:” AD“})) - 数组元素(索引:0) - 在org.apache.spark.serializer.SerializationDebugger $ .improveException(SerializationDebugger)中的数组(类[Lorg.apache.kafka.clients.consumer.ConsumerRecord;,大小为1]) .scala:40)〜[spark-core_2.11-2.3.0.jar:2.3.0] org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)〜[spark-core_2.11- 2.3.0.jar:2.3.0] org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)〜[spark-core_2.11-2.3.0.jar:2.3.0] at org。 apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:393)〜[spark-core_2.11-2.3.0.jar:2.3.0] at java.util.concurrent.ThreadPoolExecutor.runWork 呃(未知来源)[na:1.8.0_151] java.util.concurrent.ThreadPoolExecutor $ Worker.run(未知来源)[na:1.8.0_151] at java.lang.Thread.run(Unknown Source)[na: 1.8.0_151]
2019-04-24 19:07:00.025 ERROR 21144 --- [result-getter-1] o.apache.spark.scheduler.TaskSetManager:阶段48.0(TID 97)中的任务1.0有一个不可序列化的结果:org.apache .kafka.clients.consumer.ConsumerRecord
阅读主题数据的代码如下 -
@Service
public class RawEventSparkConsumer {
private final Logger logger = LoggerFactory.getLogger(RawEventSparkConsumer.class);
@Autowired
private DataModelServiceImpl dataModelServiceImpl;
@Autowired
private JavaStreamingContext streamingContext;
@Autowired
private JavaInputDStream<ConsumerRecord<String, String>> messages;
@Autowired
private EnrichEventKafkaProducer enrichEventKafkaProd;
@PostConstruct
private void sparkRawEventConsumer() {
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute(() -> {
messages.foreachRDD((rdd) -> {
List<ConsumerRecord<String, String>> rddList = rdd.collect();
Iterator<ConsumerRecord<String, String>> rddIterator = rddList.iterator();
while (rddIterator.hasNext()) {
ConsumerRecord<String, String> rddRecord = rddIterator.next();
if (rddRecord.topic().toString().equalsIgnoreCase("rawEventTopic")) {
ObjectMapper mapper = new ObjectMapper();
BaseDataModel csvDataModel = mapper.readValue(rddRecord.value(), BaseDataModel.class);
EnrichEventDataModel enrichEventDataModel = (EnrichEventDataModel) csvDataModel;
enrichEventKafkaProd.sendEnrichEvent(enrichEventDataModel);
} else if (rddRecord.topic().toString().equalsIgnoreCase("enrichEventTopic")) {
System.out.println("************getting enrichEventTopic data ************************");
}
}
});
streamingContext.start();
try {
streamingContext.awaitTermination();
} catch (InterruptedException e) { // TODO Auto-generated catch block
e.printStackTrace();
}
});
}
这是配置代码。
@Bean
public JavaInputDStream<ConsumerRecord<String, String>> getKafkaParam(JavaStreamingContext streamingContext) {
Map<String, Object> kafkaParams = new HashedMap();
kafkaParams.put("bootstrap.servers", "localhost:9092");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "group1");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", false);
Collection<String> topics = Arrays.asList(rawEventTopic,enrichEventTopic);
return KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
);
}
请帮忙。 我被困在这一点上。
在下面的链接中找到了我的问题的解决方案 -
org.apache.spark.SparkException:任务不可序列化
将内部类声明为静态变量:
static Function<Tuple2<String, String>, String> mapFunc=new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.