具有多个参数的pool.apply_async

Question

The below code should call two databases at the same time. 以下代码应同时调用两个数据库。 I tried to do it with ThreadPool but run into some difficulties. 我试图用ThreadPool做到这一点，但遇到了一些困难。 pool.apply_async doesn't seem to allow multiple parameters, so I put them into a tuple and then try to unpack them. pool.apply_async似乎不允许使用多个参数，因此我将它们放入一个元组中，然后尝试对其进行解压缩。 Is this the right approach or is there a better solution? 这是正确的方法还是有更好的解决方案？

The list of tuples is defined in params=... and the tuples have 3 entries. 元组列表在params = ...中定义，元组有3个条目。 I would expect the function to be called twice, each time with 3 parameters. 我希望函数被调用两次，每次带有3个参数。

def get_sql(self, *params):  # run with risk
    self.logger.info(len(params))
    sql=params[0]
    schema=params[1]
    db=params[2]
    self.logger.info("Running SQL with schema: {0}".format(schema))
    df = pd.read_sql(sql, db)
    return df

def compare_prod_uat(self):
    self.connect_dbrs_prod_db()
    self.connect_dbrs_uat_db()
    self.logger.info("connected to UAT and PROD database")

    sql = """ SELECT * FROM TABLE """

    params = [(sql, "DF_RISK_PRD_OWNER", self.db_dbrs_prod), (sql, "DF_RISK_CUAT_OWNER", self.db_dbrs_uat)]
    pool = ThreadPool(processes=2)
    self.logger.info("Calling Pool")
    result_prod = pool.apply_async(self.get_sql, (sql, "DF_RISK_PRD_OWNER", self.db_dbrs_prod))
    result_uat = pool.apply_async(self.get_sql, (sql, "DF_RISK_CUAT_OWNER", self.db_dbrs_uat))

    # df_prod = self.get_sql(sql, "DF_RISK_PRD_OWNER", self.db_dbrs_prod)
    # df_cuat = self.get_sql(sql, "DF_RISK_CUAT_OWNER", self.db_dbrs_uat)


    self.logger.info("Get return from uat")
    df1 = result_uat.get()  # get return value from the database call

    self.logger.info("Get return from prod")
    df2 = result_prod.get()  # get second return value from the database call


    return df1, df2

Answer 1

There may be many things wrong, but if you add 可能有很多错误，但是如果您添加

print params

as the first line of your get_sql, you'll see that you send in a tuple (sql, [(sql, "DF_RISK_PRD_OWNER", self.db_dbrs_prod), (sql, .....)]) 作为get_sql的第一行，您会看到发送了一个元组（sql，[（sql，“ DF_RISK_PRD_OWNER”，self.db_dbrs_prod），（sql，.....]]）

So yes, length of params is always two, the first parameter being "sql" whatever that is in your implementation, and the second being an array of tuples of length three. 所以是的，params的长度始终是两个，第一个参数是“ sql”，无论实现中是什么，第二个参数是长度为三的元组数组。 I don't understand why you are sending (sql,params) instead of just (params,) as "sql" seems to be there in the array elements. 我不明白为什么要发送（sql，params）而不是仅发送（params，），因为“ sql”似乎存在于数组元素中。 If it needs to be there, your array is in params[1]. 如果需要放在那里，则您的数组位于params [1]中。

However, I don't understand how your worker function would traverse this array. 但是，我不明白您的辅助函数将如何遍历此数组。 It seems to be built to execute only one sql statement as it doesn't have a for loop. 它似乎只执行一个sql语句，因为它没有for循环。 Maybe you intended to do the for loop in your compare_prod_uat function and spawn as many workers as you have elements in your array? 也许您打算在compare_prod_uat函数中执行for循环，并生成与数组中元素数量一样多的worker？ I don't know but it currently doesn't make much sense. 我不知道，但是目前没有多大意义。

The parameter issue can be fixed by this, though. 但是，可以通过此方法解决参数问题。

具有多个参数的pool.apply_async

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-10-31 15:40:09

具有多个参数的pool.apply_async

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-10-31 15:40:09

解决方案1
0 已采纳 2016-10-31 15:40:09