简体   繁体   English

无法在 KAFKA 主题中推送 Json

[英]Unable to push Json in KAFKA topic

I try to push data in JSON format inside a KAFKA topic but without success.我尝试在 KAFKA 主题中以 JSON 格式推送数据,但没有成功。

I used the following AVRO SCHEMA :我使用了以下 AVRO SCHEMA :

{"schemaType":"AVRO","schema":"{\"title\":\"json pipeline\",\"name\":\"MyClass\",\"type\":\"record\",\"namespace\":\"com.acme.avro\",\"fields\":[{\"name\":\"web\",\"type\":{\"name\":\"web\",\"type\":\"record\",\"fields\":[{\"name\":\"test\",\"type\":{\"name\":\"test\",\"type\":\"record\",\"fields\":[{\"name\":\"createdDate\",\"type\":\"string\"},{\"name\":\"modifiedDate\",\"type\":\"string\"},{\"name\":\"createdBy\",\"type\":\"string\"},{\"name\":\"modifiedBy\",\"type\":\"string\"},{\"name\":\"enabled\",\"type\":\"int\"},{\"name\":\"savedEvent\",\"type\":\"int\"},{\"name\":\"testId\",\"type\":\"int\"},{\"name\":\"testName\",\"type\":\"string\"},{\"name\":\"type\",\"type\":\"string\"},{\"name\":\"interval\",\"type\":\"int\"},{\"name\":\"httpInterval\",\"type\":\"int\"},{\"name\":\"url\",\"type\":\"string\"},{\"name\":\"protocol\",\"type\":\"string\"},{\"name\":\"networkMeasurements\",\"type\":\"int\"},{\"name\":\"mtuMeasurements\",\"type\":\"int\"},{\"name\":\"bandwidthMeasurements\",\"type\":\"int\"},{\"name\":\"bgpMeasurements\",\"type\":\"int\"},{\"name\":\"usePublicBgp\",\"type\":\"int\"},{\"name\":\"alertsEnabled\",\"type\":\"int\"},{\"name\":\"liveShare\",\"type\":\"int\"},{\"name\":\"httpTimeLimit\",\"type\":\"int\"},{\"name\":\"httpTargetTime\",\"type\":\"int\"},{\"name\":\"httpVersion\",\"type\":\"int\"},{\"name\":\"pageLoadTimeLimit\",\"type\":\"int\"},{\"name\":\"pageLoadTargetTime\",\"type\":\"int\"},{\"name\":\"followRedirects\",\"type\":\"int\"},{\"name\":\"includeHeaders\",\"type\":\"int\"},{\"name\":\"sslVersionId\",\"type\":\"int\"},{\"name\":\"verifyCertificate\",\"type\":\"int\"},{\"name\":\"useNtlm\",\"type\":\"int\"},{\"name\":\"authType\",\"type\":\"string\"},{\"name\":\"contentRegex\",\"type\":\"string\"},{\"name\":\"identifyAgentTrafficWithUserAgent\",\"type\":\"int\"},{\"name\":\"probeMode\",\"type\":\"string\"},{\"name\":\"pathTraceMode\",\"type\":\"string\"},{\"name\":\"description\",\"type\":\"string\"},{\"name\":\"numPathTraces\",\"type\":\"int\"},{\"name\":\"apiLinks\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"apiLinks_record\",\"type\":\"record\",\"fields\":[{\"name\":\"rel\",\"type\":\"string\"},{\"name\":\"href\",\"type\":\"string\"}]}}},{\"name\":\"sslVersion\",\"type\":\"string\"}]}},{\"name\":\"pageLoad\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"pageLoad_record\",\"type\":\"record\",\"fields\":[{\"name\":\"agentName\",\"type\":\"string\"},{\"name\":\"countryId\",\"type\":\"string\"},{\"name\":\"date\",\"type\":\"string\"},{\"name\":\"agentId\",\"type\":\"int\"},{\"name\":\"roundId\",\"type\":\"int\"},{\"name\":\"responseTime\",\"type\":\"int\"},{\"name\":\"totalSize\",\"type\":\"int\"},{\"name\":\"numObjects\",\"type\":\"int\"},{\"name\":\"numErrors\",\"type\":\"int\"},{\"name\":\"domLoadTime\",\"type\":\"int\"},{\"name\":\"pageLoadTime\",\"type\":\"int\"},{\"name\":\"permalink\",\"type\":\"string\"}]}}}]}},{\"name\":\"pages\",\"type\":{\"name\":\"pages\",\"type\":\"record\",\"fields\":[{\"name\":\"current\",\"type\":\"int\"}]}}]}"

This AVRO schema is succeffuly pushed in my SchemaRegistry此 AVRO 架构已成功推送到我的 SchemaRegistry

Then in my producer I used AvroSerializer然后在我的制作人中我使用了 AvroSerializer

import time
import json
import sys
import requests

from confluent_kafka import Producer
from confluent_kafka import SerializingProducer
from confluent_kafka.serialization import StringSerializer
from confluent_kafka.schema_registry.schema_registry_client import SchemaRegistryClient
from confluent_kafka.schema_registry.json_schema import JSONSerializer
from  confluent_kafka.schema_registry.avro import AvroSerializer

from utils import set_logger

from confluent_kafka.admin import AdminClient, NewTopic

TOPIC = os.environ.get("MY_TOPIC_IN")
WEB_PAGE_LOAD_URL = os.environ.get("URL")
ACCOUNT_GROUP_ID_1000EYES = os.environ.get("ACCOUNT_ID")
TE_BEARER = os.environ.get("TE_BEARER")
LOGGER = set_logger("producer_logger")

def metrics(test_id):
    res = {}
    #url= WEB_PAGE_LOAD_URL + '{}.json?aid{}'.format(test_id, ACCOUNT_GROUP_ID_1000EYES)
    url= "{}{}.json?aid{}".format(WEB_PAGE_LOAD_URL, test_id, ACCOUNT_GROUP_ID_1000EYES)
    session = requests.session()
    headers = {'Authorization': TE_BEARER}
    rep=session.get(url, headers=headers)
    res = rep.json()
    print(res)
    return res

if __name__ == "__main__":

    conf={"bootstrap.servers":"json_kafka:29094"}
    admin_client = AdminClient(conf)
    topic_list = [NewTopic("my_topic_in", 1, 1)]
    admin_client.create_topics(new_topics=topic_list)

    if sys.argv[1] == "json" :

        schema_registry_url = {"url": "http://json_schema-registry:8083"}
        sr = SchemaRegistryClient(schema_registry_url)
        subjects = sr.get_subjects()
        '''retrieve json shcema in schema registry'''
        for subject in subjects:
            #print(subject)
            schema = sr.get_latest_version(subject)
            print(schema.subject)
            if schema.subject == "{}-value".format(TOPIC) :
                my_schema=schema.schema.schema_str
                json_serializer = JSONSerializer(my_schema,sr,to_dict=None,conf=None)
                '''create json producer'''
                json_producer_conf = {'bootstrap.servers':'json_kafka:29094' ,
                                      'key.serializer': StringSerializer('utf_8'),
                                      'value.serializer': AvroSerializer}
                                      
                producer = SerializingProducer(json_producer_conf)

    elif sys.argv[1]=="string":
        string_producer_conf = {'bootstrap.servers':'json_kafka:29094',
            'enable.idempotence': 'true'}
        '''create string producer '''
        producer = Producer(string_producer_conf)

    while True:

        response_json=metrics(1136837) #300

        raw_json = json.dumps(response_json,indent=4)

        print(raw_json)

        try:
            #producer.produce(topic=TOPIC, value=raw_json)
            producer.produce(topic=TOPIC, value=raw_json)
            producer.poll(1)

        except Exception as e:
            LOGGER.error("There is a problem with the topic {}\n".format(TOPIC))
            LOGGER.error("The problem is: {}!".format(e))

        LOGGER.info("Produced into Kafka topic: {}.".format(TOPIC))
        LOGGER.info("Waiting for the next round...")
        time.sleep(300)}

And then when I launch my producer I have the following error然后当我启动我的生产者时,我有以下错误

ERROR The problem is: KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="'SerializationContext' object has no attribute 'strip'"}!>错误问题是:KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="'SerializationContext' object has no attribute 'strip'"}!>

remark : when I used "string" as argument with my producer it works well备注:当我使用“字符串”作为我的制作人的参数时,效果很好

I tried many things without success and really don't understand the error message, any help will be appreciate thanks.我尝试了很多事情都没有成功,真的不明白错误信息,任何帮助将不胜感激,谢谢。

The problem with your code is that you passed the actual class AvroSerializer to value.serializer property, not an instance of one.您的代码的问题在于您将实际类AvroSerializer传递给value.serializer属性,而不是一个实例。

As shown in the example code , you need to create an instance with the schema, URL, and a serializer function-handle.示例代码所示,您需要使用架构、URL 和序列化程序函数句柄创建一个实例。 You then would return a dict from the AvroSerializer serializer function-handle, and not produce a string from json.dumps ... If you wanted to send an actual JSON string, then you don't need to use AvroSerializer , as that would send binary Avro data然后你会从AvroSerializer序列化器函数句柄返回一个dict ,而不是从json.dumps产生一个字符串......如果你想发送一个实际的JSON字符串,那么你不需要使用AvroSerializer ,因为它会发送二进制 Avro 数据

Reducing the code to the important parts...将代码减少到重要的部分......

class User:
  def __init__(self, ...):
     pass

def user_to_dict(user, ctx):
    return dict(...)


schema_registry_conf = {'url': 'http://...'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)

avro_serializer = AvroSerializer(schema_str,
                                 schema_registry_client,
                                 user_to_dict)  # object serializer function defined here

producer_conf = {'bootstrap.servers': '...',
                 'key.serializer': StringSerializer('utf_8'),
                 'value.serializer': avro_serializer}

producer = SerializingProducer(producer_conf)

...
while True:
    # Serve on_delivery callbacks from previous calls to produce()
    producer.poll(0.0)
    try:
        # ... get fields 
        user = User(...)  # create an object
        producer.produce(topic=topic, key='...', value=user,  # sending the object
                         on_delivery=delivery_report)
    ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法使用 Apache Beam 将消息从源 kafka 主题推送到目标 kafka 主题 - Unable to push messages from source kafka topic to destination kafka topic using Apache beam 看不到 Kafka 主题中的数据 - Unable to see Data in Kafka Topic 无法使用Confluent Elasticsearch Sink连接器将Kafka主题数据转换为结构化JSON - Unable to convert Kafka topic data into structured JSON with Confluent Elasticsearch sink connector Python Kafka生产者无法写入主题 - Python Kafka producer unable to write to topic 无法向 Kafka Python 中的主题发送消息 - Unable to send messages to Topic in Kafka Python 无法将后续事务推送到 Kafka Producer - Unable to push subsequent transactions to Kafka Producer 如何从kafka主题中读取json字符串到pyspark dataframe? - How to read json string from kafka topic into pyspark dataframe? 阅读 Json 文件并发布到 Kafka 主题 - 最好的方法是什么? - Read Json file and post to Kafka topic - What is the best approach? 更新Kafka主题中的消息 - Update message in Kafka topic KafkaError{code=_UNKNOWN_TOPIC,val=-188,str="Unable to produce message: Local: Unknown topic"} 在 Python 中使用 kafka producer 时出错 - KafkaError{code=_UNKNOWN_TOPIC,val=-188,str="Unable to produce message: Local: Unknown topic"} error while using kafka producer in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM