繁体   English   中英

获取NotSerializableException - 将Spark Streaming与Kafka一起使用时

[英]Getting NotSerializableException - When using Spark Streaming with Kafka

我正在使用SparkStreaming来读取主题中的数据。 我正面临一个例外。

java.io.NotSerializableException:org.apache.kafka.clients.consumer.ConsumerRecord序列化堆栈: - 对象不可序列化(类:org.apache.kafka.clients.consumer.ConsumerRecord,值:ConsumerRecord(topic = rawEventTopic,partition = 0 ,offset = 14098,CreateTime = 1556113016951,序列化密钥大小= -1,序列化值大小= 2916,headers = RecordHeaders(headers = [],isReadOnly = false),key = null,value = {“id”:null,“消息 “:NULL,” EVENTDATE “:””, “基团”:空, “类别”: “AD”, “username” 的:NULL, “inboundDataSource”: “AD”, “源”: “192.168.1.14”, “destination”:“192.168.1.15”,“bytesSent”:“200KB”,“rawData”:“{username:vinit}”,“account_name”:null,“security_id”:null,“account_domain”:null,“logon_id “:空,” PROCESS_ID “:空,” PROCESS_INFORMATION “:空,” PROCESS_NAME “:空,” target_server_name “:空,” source_network_address “:空,” logon_process “:空,” authentication_Package “:空,” network_address“:空, “FAILURE_REASON”:空, “workstation_name”:空, “target_server”:空,“网络_ “:空,” OBJECT_TYPE “:空,” OBJECT_NAME “:空,” source_port “:空,” logon_type “:空,” GROUP_NAME “:空,” source_dra “:空,” destination_dra “:空,” group_admin“:空, “sam_account_name”:空, “new_logon”:空 “的destination_address”:空, “destination_port”:空, “source_address”:空, “logon_account”:空, “sub_status”:空, “EVENTDATE”:空, “TIME_TAKEN”:空, “s_computername”:空, “cs_method”:空, “cs_uri_stem”:空, “cs_uri_query”:空, “c_ip”:空, “s_ip”:空, “s_supplier_name”:空,“s_sitename “:空,” cs_username “:空,” cs_auth_group “:空,” cs_categories “:空,” s_action “:空,” cs_host “:空,” cs_uri “:空,” cs_uri_scheme “:空,” cs_uri_port“:空, “cs_uri_path”:空, “cs_uri_extension”:空, “cs_referer”:空, “cs_user_agent”:空, “cs_bytes”:空, “sc_status”:空, “sc_bytes”:空, “sc_filter_result”:空, “sc_filter_category”:空, “x_virus_id”:空, “x_exception_id”:空, “rs_content_type”:空, “s_supplier_ip”:空, “cs_cookie”:空, “s_port”:空, “cs_version”:空,“创建时间“:空,” 手术 “:空,” 工作量“:N ULL, “clientIP”:空, “用户id”:空 “的EventSource”:空, “项目类型”:空 “的userAgent”:空, “EVENTDATA”:空, “sourceFileName”:空, “SITEURL”:空, “targetUserOrGroupType”:空, “targetUserOrGroupName”:空, “sourceFileExtension”:空, “sourceRelativeUrl”:空, “resultStatus”:空, “客户端”:空, “loginStatus”:空, “USERDOMAIN”:空,“端ClientIPAddress “:空,” clientProcessName “:空,” 客户机版本 “:空,” externalAccess “:空,” 登录类型 “:空,” mailboxOwnerUPN “:空,” 单位名称 “:空,” originatingServer “:空,” 主题“:空, “sendAsUserSmtp”:空, “deviceexternalid”:空, “deviceeventcategory”:空, “devicecustomstring1”:空, “customnumber2”:空, “customnumber1”:空, “emailsender”:空, “sourceusername”:空, “sourceaddress”:空, “emailrecipient”:空, “目的地地址”:空, “destinationport”:空, “requestclientapplication”:空, “oldfilepath”:空, “文件路径”:空, “additionaldetails11”:空,“应用协议“:空,” emailrecipienttype “:空,” Emailsubject的 “:空,” transactionstring1 “:空,” deviceaction“:N ULL, “devicecustomdate2”:空, “devicecustomdate1”:空, “sourcehostname”:空, “additionaldetails10”:空, “文件名”:空, “bytesout”:空, “additionaldetails13”:空, “additionaldetails14”:空, “ACCOUNTNAME”:NULL, “destinationhostname”:NULL, “DataSourceID的”:2 “日期”: “”, “违反”:假 “oobjectId”:NULL, “eventCategoryName”: “AD”, “sourceDataType”:” AD“})) - 数组元素(索引:0) - 在org.apache.spark.serializer.SerializationDebugger $ .improveException(SerializationDebugger)中的数组(类[Lorg.apache.kafka.clients.consumer.ConsumerRecord;,大小为1]) .scala:40)〜[spark-core_2.11-2.3.0.jar:2.3.0] org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)〜[spark-core_2.11- 2.3.0.jar:2.3.0] org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)〜[spark-core_2.11-2.3.0.jar:2.3.0] at org。 apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:393)〜[spark-core_2.11-2.3.0.jar:2.3.0] at java.util.concurrent.ThreadPoolExecutor.runWork 呃(未知来源)[na:1.8.0_151] java.util.concurrent.ThreadPoolExecutor $ Worker.run(未知来源)[na:1.8.0_151] at java.lang.Thread.run(Unknown Source)[na: 1.8.0_151]

2019-04-24 19:07:00.025 ERROR 21144 --- [result-getter-1] o.apache.spark.scheduler.TaskSetManager:阶段48.0(TID 97)中的任务1.0有一个不可序列化的结果:org.apache .kafka.clients.consumer.ConsumerRecord

阅读主题数据的代码如下 -

 @Service
public class RawEventSparkConsumer {
    private final Logger logger = LoggerFactory.getLogger(RawEventSparkConsumer.class);

    @Autowired
    private DataModelServiceImpl dataModelServiceImpl;

    @Autowired
    private JavaStreamingContext streamingContext;

    @Autowired
    private JavaInputDStream<ConsumerRecord<String, String>> messages;

    @Autowired
    private EnrichEventKafkaProducer enrichEventKafkaProd;

    @PostConstruct
    private void sparkRawEventConsumer() {

        ExecutorService executor = Executors.newSingleThreadExecutor();
        executor.execute(() -> {

            messages.foreachRDD((rdd) -> {

                List<ConsumerRecord<String, String>> rddList = rdd.collect();
                Iterator<ConsumerRecord<String, String>> rddIterator = rddList.iterator();
                while (rddIterator.hasNext()) {
                    ConsumerRecord<String, String> rddRecord = rddIterator.next();

                    if (rddRecord.topic().toString().equalsIgnoreCase("rawEventTopic")) {
                        ObjectMapper mapper = new ObjectMapper();
                        BaseDataModel csvDataModel = mapper.readValue(rddRecord.value(), BaseDataModel.class);
                        EnrichEventDataModel enrichEventDataModel = (EnrichEventDataModel) csvDataModel;
                        enrichEventKafkaProd.sendEnrichEvent(enrichEventDataModel);

                    } else if (rddRecord.topic().toString().equalsIgnoreCase("enrichEventTopic")) {
                        System.out.println("************getting enrichEventTopic data ************************");
                    }

                }

            });

            streamingContext.start();

            try {
                streamingContext.awaitTermination();
            } catch (InterruptedException e) { // TODO Auto-generated catch block
                e.printStackTrace();
            }
        });

    }

这是配置代码。

@Bean
public JavaInputDStream<ConsumerRecord<String, String>> getKafkaParam(JavaStreamingContext streamingContext) {
            Map<String, Object> kafkaParams = new HashedMap();
            kafkaParams.put("bootstrap.servers", "localhost:9092");
            kafkaParams.put("key.deserializer", StringDeserializer.class);
            kafkaParams.put("value.deserializer", StringDeserializer.class);
            kafkaParams.put("group.id", "group1");
            kafkaParams.put("auto.offset.reset", "latest");
            kafkaParams.put("enable.auto.commit", false);
            Collection<String> topics = Arrays.asList(rawEventTopic,enrichEventTopic);

            return KafkaUtils.createDirectStream(
                    streamingContext,
                    LocationStrategies.PreferConsistent(),
                    ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
            );


        }

请帮忙。 我被困在这一点上。

在下面的链接中找到了我的问题的解决方案 -

org.apache.spark.SparkException:任务不可序列化

将内部类声明为静态变量:

static Function<Tuple2<String, String>, String> mapFunc=new Function<Tuple2<String, String>, String>() {
    @Override
    public String call(Tuple2<String, String> tuple2) {
        return tuple2._2();
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM