[英]How to combine kafka python consumer and ThreadPoolExecutor?
我有一個用 Python 編碼的 Kafka 消費者客戶端,例如:
def main():
producer = KafkaProducer(
bootstrap_servers=kafka_setting['bootstrap_servers'],
api_version=(0, 10),
retries=5)
consumer = KafkaConsumer(
bootstrap_servers=kafka_setting['bootstrap_servers'],
group_id=kafka_setting['consumer_id'],
api_version=(0, 10),
session_timeout_ms=25000,
max_poll_records=100,
fetch_max_bytes=1 * 1024 * 1024)
consumer.subscribe((kafka_setting['fetch_url_topic'], ))
msg_cnt = 0
for message in consumer:
msg_cnt = msg_cnt + 1
vid_url = message.value.decode("utf-8")
post_processing_url(vid_url, producer)
def post_processing_url(vid_url, producer):
...long time to process...
...send the process result to another kafka topic...
我的情況是,從 kafka 的主題中獲取消息非常快,但post_processing_url
可能需要 10 秒。
在閱讀了 Python 3 中的如何使用 ThreadPoolExecutor 之后,我想知道我們是否可以使用 threadpoolexecutor 使post_processing_url
在另一個線程中運行以使 kafka 消耗得更快?
如果我們可以,怎么做?
我現在正在使用線程,不確定它是否會長期OOM。
import threading
...
for message in consumer:
msg_cnt = msg_cnt + 1
vid_url = message.value.decode("utf-8")
t = threading.Thread(target=post_processing_url,
args=(vid_url, producer))
t.start()
您可以使用Python 中的多線程隊列來實現您想要的。 所以基本思路如下:
您最需要的功能是:
編輯:
import threading, queue
q = queue.Queue()
def worker():
while True:
message = q.get()
print(f'Working on {item}')
''' Do the processing on messages '''
print(f'Finished {item}')
q.task_done()
# spawn some threads to run worker
threading.Thread(target=worker, daemon=True).start()
#function to read from kafka
def f():
for message in consumer:
q.put(message)
#Either run the function f directly or allocate some thread to run it
f()
# Alter: threading.Thread(target=f, daemon=True).start()
print('All task requests sent\n', end='')
# block until all tasks are done
q.join()
print('All work completed')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.