[英]Parallel Cassandra requests using Python multiprocessing library errors on wait()
我寫了一段多處理代碼。 連接到 cassandra,我在那里運行 32 個查詢來獲取數據。 我嘗試使用 python 中的多處理庫來並行化提取。 代碼看起來像這樣。
from cassandra.cluster import Cluster
cluster = Cluster(['xyz'])
session = cluster.connect()
query = session.prepare('SELECT stuff')
session.default_timeout = 600000
session.default_fetch_size = 100
queries = [
session.execute_async(query, ['2021-10-19'] + [i])
for i in range(32)
]
pool = mp.Pool(32)
inter_obj = pool.map_async(compute, queries)
inter_obj.wait()
res = inter_obj.get()
pool.close()
pool.join()
final_response = reduce(aggregate, res)
resp = json.dumps(final_response, sort_keys=True, indent=4).encode("utf-8")
print("RESPONSE", resp)
在運行程序時,它在 wait() 上出錯
Traceback (most recent call last):
File "/usr/local/bin/date-run", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/sc_eol/run_stuff.py", line 75, in main
res = inter_obj.get()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 768, in get
raise self._value
File "/usr/lib/python3.8/multiprocessing/pool.py", line 537, in _handle_tasks
put(task)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle '_thread.RLock' object
execute_async()
返回一個ResponseFuture
對象。 您最好使用以下內容構建“期貨”列表:
futures = []
query = ...
for ... :
futures.append(session.execute_async(query, ...)
這種方法並發執行查詢。 然后,您可以使用以下方法迭代結果:
for future in futures:
rows = future.result()
# insert processing here
對result()
的調用被阻塞,直到請求返回結果或錯誤。
有關詳細信息,請參閱 Cassandra Python 驅動程序入門指南。 干杯!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.