[英]Sharing an Oracle database connection between simultaneous Celery tasks
我正在使用Python2.7,Celery和cx_Oracle來訪問Oracle數據庫。
我創建了很多任務。 每個任務都通過cx_Oracle運行查詢。 許多任務將同時運行。 所有任務應共享相同的數據庫連接。
如果我僅啟動一項任務,查詢將正確運行。 但是,如果我啟動幾個查詢,則會開始收到以下錯誤消息:
[2016-04-04 17:12:43,846: ERROR/MainProcess] Task tasks.run_query[574a6e7f-f58e-4b74-bc84-af4555af97d6] raised unexpected: DatabaseError('<cx_Oracle._Error object at 0x7f9976635580>',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/home/ric/workspace/dbw_celery/tasks.py", line 39, in run_query
column_names = get_column_names(oracle_conn, table_info["table_name"])
File "/home/ric/workspace/dbw_celery/utilities.py", line 87, in get_column_names
cursor.execute(query_str)
DatabaseError: <cx_Oracle._Error object at 0x7f9976635580>
現在讓我們看看我的代碼。
這是我的tasks.py
文件,在其中創建Oracle數據庫連接,Celery實例並定義我的任務,該任務將由用戶表示數據庫連接:
# tasks.py
import celeryconfig
from celery import Celery
from utilities import connect_to_db, get_new_rows, write_output_rows
# Define a Celery instance
dbwapp = Celery('tasks')
dbwapp.config_from_object(celeryconfig)
dbwapp.conf["CELERYBEAT_SCHEDULE"] = {}
# Define an Oracle connection as a global variable to be used by all tasks
oracle_conn = connect_to_db(db_user, db_pass, db_host, db_port, db_name)
# Define the task function that each Celery worker will run
@dbwapp.task()
def run_query(table_info, output_description):
"""Run a query on a given table. Writes found rows to output file."""
global oracle_conn
column_names = get_column_names(oracle_conn, table_info["table_name"])
new_rows, last_check_timestamp = get_new_rows(oracle_conn, table_info)
write_result_to_output_file(output_file, new_rows)
def load_celerybeat_schedule():
"""Loads the CELERYBEAT_SCHEDULE dictionary with the tasks to run."""
new_task_dict = {
"task": "tasks.run_query",
"schedule": timedelta(seconds=table_config["check_interval"]),
"args": (table_config, output_description)
}
new_task_name = "task-" + table_config["table_name"]
dbwapp.conf["CELERYBEAT_SCHEDULE"][new_task_name] = new_task_dict
這是我連接到utilities.py
文件中的數據庫的方式:
# utilities.py
def connect_to_db(db_user, db_password, db_host, db_port, db_name):
"""Connect to DB."""
connection_str = "%s/%s@%s:%s/%s" % (db_user, db_password, db_host, db_port, db_name)
try:
db_connection = cx_Oracle.connect(connection_str)
except cx_Oracle.DatabaseError:
logger.error("Couldn't connect to DB %s" % db_name)
return None
logging.info("Succesfully connected to the DB: %s" % db_name)
return db_connection
這是在另一個文件中定義的get_new_rows_function
,實際上在其中運行查詢:
#utilities.py
def get_new_rows(db_connection, table_info):
"""Return new rows inserted in a given table since last check."""
cursor = db_connection.cursor()
query_str = "SELECT * FROM {0}".format(table_info["table_name"])
cursor.execute(query_str)
new_rows = cursor.fetchall()
cursor.close()
return new_rows
我這樣運行我的代碼: celery -A tasks worker -B
我試圖簡化我的代碼,以使其更易於理解。
恐怕我得到的錯誤是由同時運行並共享同一數據庫連接的不同任務引起的。 他們的同時執行被“混淆”或類似的事情。
在不同的Celery任務之間共享數據庫連接的正確方法是什么?
有人知道我在做什么錯嗎?
如果希望多個線程共享同一連接,則需要啟用線程模式。 像這樣:
conn = cx_Oracle.connect(connection_str, threaded = True)
如果您不這樣做,可能會遇到一些有趣的問題!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.