[英]Unable to push messages from source kafka topic to destination kafka topic using Apache beam
[英]Unable to push Json in KAFKA topic
我嘗試在 KAFKA 主題中以 JSON 格式推送數據,但沒有成功。
我使用了以下 AVRO SCHEMA :
{"schemaType":"AVRO","schema":"{\"title\":\"json pipeline\",\"name\":\"MyClass\",\"type\":\"record\",\"namespace\":\"com.acme.avro\",\"fields\":[{\"name\":\"web\",\"type\":{\"name\":\"web\",\"type\":\"record\",\"fields\":[{\"name\":\"test\",\"type\":{\"name\":\"test\",\"type\":\"record\",\"fields\":[{\"name\":\"createdDate\",\"type\":\"string\"},{\"name\":\"modifiedDate\",\"type\":\"string\"},{\"name\":\"createdBy\",\"type\":\"string\"},{\"name\":\"modifiedBy\",\"type\":\"string\"},{\"name\":\"enabled\",\"type\":\"int\"},{\"name\":\"savedEvent\",\"type\":\"int\"},{\"name\":\"testId\",\"type\":\"int\"},{\"name\":\"testName\",\"type\":\"string\"},{\"name\":\"type\",\"type\":\"string\"},{\"name\":\"interval\",\"type\":\"int\"},{\"name\":\"httpInterval\",\"type\":\"int\"},{\"name\":\"url\",\"type\":\"string\"},{\"name\":\"protocol\",\"type\":\"string\"},{\"name\":\"networkMeasurements\",\"type\":\"int\"},{\"name\":\"mtuMeasurements\",\"type\":\"int\"},{\"name\":\"bandwidthMeasurements\",\"type\":\"int\"},{\"name\":\"bgpMeasurements\",\"type\":\"int\"},{\"name\":\"usePublicBgp\",\"type\":\"int\"},{\"name\":\"alertsEnabled\",\"type\":\"int\"},{\"name\":\"liveShare\",\"type\":\"int\"},{\"name\":\"httpTimeLimit\",\"type\":\"int\"},{\"name\":\"httpTargetTime\",\"type\":\"int\"},{\"name\":\"httpVersion\",\"type\":\"int\"},{\"name\":\"pageLoadTimeLimit\",\"type\":\"int\"},{\"name\":\"pageLoadTargetTime\",\"type\":\"int\"},{\"name\":\"followRedirects\",\"type\":\"int\"},{\"name\":\"includeHeaders\",\"type\":\"int\"},{\"name\":\"sslVersionId\",\"type\":\"int\"},{\"name\":\"verifyCertificate\",\"type\":\"int\"},{\"name\":\"useNtlm\",\"type\":\"int\"},{\"name\":\"authType\",\"type\":\"string\"},{\"name\":\"contentRegex\",\"type\":\"string\"},{\"name\":\"identifyAgentTrafficWithUserAgent\",\"type\":\"int\"},{\"name\":\"probeMode\",\"type\":\"string\"},{\"name\":\"pathTraceMode\",\"type\":\"string\"},{\"name\":\"description\",\"type\":\"string\"},{\"name\":\"numPathTraces\",\"type\":\"int\"},{\"name\":\"apiLinks\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"apiLinks_record\",\"type\":\"record\",\"fields\":[{\"name\":\"rel\",\"type\":\"string\"},{\"name\":\"href\",\"type\":\"string\"}]}}},{\"name\":\"sslVersion\",\"type\":\"string\"}]}},{\"name\":\"pageLoad\",\"type\":{\"type\":\"array\",\"items\":{\"name\":\"pageLoad_record\",\"type\":\"record\",\"fields\":[{\"name\":\"agentName\",\"type\":\"string\"},{\"name\":\"countryId\",\"type\":\"string\"},{\"name\":\"date\",\"type\":\"string\"},{\"name\":\"agentId\",\"type\":\"int\"},{\"name\":\"roundId\",\"type\":\"int\"},{\"name\":\"responseTime\",\"type\":\"int\"},{\"name\":\"totalSize\",\"type\":\"int\"},{\"name\":\"numObjects\",\"type\":\"int\"},{\"name\":\"numErrors\",\"type\":\"int\"},{\"name\":\"domLoadTime\",\"type\":\"int\"},{\"name\":\"pageLoadTime\",\"type\":\"int\"},{\"name\":\"permalink\",\"type\":\"string\"}]}}}]}},{\"name\":\"pages\",\"type\":{\"name\":\"pages\",\"type\":\"record\",\"fields\":[{\"name\":\"current\",\"type\":\"int\"}]}}]}"
此 AVRO 架構已成功推送到我的 SchemaRegistry
然后在我的制作人中我使用了 AvroSerializer
import time
import json
import sys
import requests
from confluent_kafka import Producer
from confluent_kafka import SerializingProducer
from confluent_kafka.serialization import StringSerializer
from confluent_kafka.schema_registry.schema_registry_client import SchemaRegistryClient
from confluent_kafka.schema_registry.json_schema import JSONSerializer
from confluent_kafka.schema_registry.avro import AvroSerializer
from utils import set_logger
from confluent_kafka.admin import AdminClient, NewTopic
TOPIC = os.environ.get("MY_TOPIC_IN")
WEB_PAGE_LOAD_URL = os.environ.get("URL")
ACCOUNT_GROUP_ID_1000EYES = os.environ.get("ACCOUNT_ID")
TE_BEARER = os.environ.get("TE_BEARER")
LOGGER = set_logger("producer_logger")
def metrics(test_id):
res = {}
#url= WEB_PAGE_LOAD_URL + '{}.json?aid{}'.format(test_id, ACCOUNT_GROUP_ID_1000EYES)
url= "{}{}.json?aid{}".format(WEB_PAGE_LOAD_URL, test_id, ACCOUNT_GROUP_ID_1000EYES)
session = requests.session()
headers = {'Authorization': TE_BEARER}
rep=session.get(url, headers=headers)
res = rep.json()
print(res)
return res
if __name__ == "__main__":
conf={"bootstrap.servers":"json_kafka:29094"}
admin_client = AdminClient(conf)
topic_list = [NewTopic("my_topic_in", 1, 1)]
admin_client.create_topics(new_topics=topic_list)
if sys.argv[1] == "json" :
schema_registry_url = {"url": "http://json_schema-registry:8083"}
sr = SchemaRegistryClient(schema_registry_url)
subjects = sr.get_subjects()
'''retrieve json shcema in schema registry'''
for subject in subjects:
#print(subject)
schema = sr.get_latest_version(subject)
print(schema.subject)
if schema.subject == "{}-value".format(TOPIC) :
my_schema=schema.schema.schema_str
json_serializer = JSONSerializer(my_schema,sr,to_dict=None,conf=None)
'''create json producer'''
json_producer_conf = {'bootstrap.servers':'json_kafka:29094' ,
'key.serializer': StringSerializer('utf_8'),
'value.serializer': AvroSerializer}
producer = SerializingProducer(json_producer_conf)
elif sys.argv[1]=="string":
string_producer_conf = {'bootstrap.servers':'json_kafka:29094',
'enable.idempotence': 'true'}
'''create string producer '''
producer = Producer(string_producer_conf)
while True:
response_json=metrics(1136837) #300
raw_json = json.dumps(response_json,indent=4)
print(raw_json)
try:
#producer.produce(topic=TOPIC, value=raw_json)
producer.produce(topic=TOPIC, value=raw_json)
producer.poll(1)
except Exception as e:
LOGGER.error("There is a problem with the topic {}\n".format(TOPIC))
LOGGER.error("The problem is: {}!".format(e))
LOGGER.info("Produced into Kafka topic: {}.".format(TOPIC))
LOGGER.info("Waiting for the next round...")
time.sleep(300)}
然后當我啟動我的生產者時,我有以下錯誤
錯誤問題是:KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="'SerializationContext' object has no attribute 'strip'"}!>
備注:當我使用“字符串”作為我的制作人的參數時,效果很好
我嘗試了很多事情都沒有成功,真的不明白錯誤信息,任何幫助將不勝感激,謝謝。
您的代碼的問題在於您將實際類AvroSerializer
傳遞給value.serializer
屬性,而不是一個實例。
如示例代碼所示,您需要使用架構、URL 和序列化程序函數句柄創建一個實例。 然后你會從AvroSerializer
序列化器函數句柄返回一個dict
,而不是從json.dumps
產生一個字符串......如果你想發送一個實際的JSON字符串,那么你不需要使用AvroSerializer
,因為它會發送二進制 Avro 數據
將代碼減少到重要的部分......
class User:
def __init__(self, ...):
pass
def user_to_dict(user, ctx):
return dict(...)
schema_registry_conf = {'url': 'http://...'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)
avro_serializer = AvroSerializer(schema_str,
schema_registry_client,
user_to_dict) # object serializer function defined here
producer_conf = {'bootstrap.servers': '...',
'key.serializer': StringSerializer('utf_8'),
'value.serializer': avro_serializer}
producer = SerializingProducer(producer_conf)
...
while True:
# Serve on_delivery callbacks from previous calls to produce()
producer.poll(0.0)
try:
# ... get fields
user = User(...) # create an object
producer.produce(topic=topic, key='...', value=user, # sending the object
on_delivery=delivery_report)
...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.