简体   繁体   English

如何在 python 中包装列表函数?

[英]How to wrap list functions in python?

I cannot accurately reflect this problem into title.我无法准确地将这个问题反映到标题中。 I want to use list , func(*args) and Pool.map without errors.我想使用listfunc(*args)Pool.map没有错误。 Please see below.请看下文。

▼Code ▼代码

def df_parallelize_run(func, arguments):
    p = Pool(psutil.cpu_count())
    df = p.map(func, arguments)
    p.close()
    p.join()
    return df
def make_lag(df: DataFrame, LAG_DAY: list):
    for l in LAG_DAY:
        df[f'lag{l}d'] = df.groupby(['id'])['target'].transform(lambda x: x.shift(l))

    return df
def wrap_make_lag(args):
    return make_lag(*args)

Given above three functions, I want to do followings鉴于以上三个功能,我想做以下

# df: DataFrame
arguments = (df, [1, 3, 7, 13, 16])
df = df_parallelize_run(wrap_make_lag, arguments)

▼ Error ▼ 错误

in df_parallelize_run(func, arguments)
----> 7     df = pool.map(func, arguments)

in ..../python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()

in ..../python3.7/multiprocessing/pool.py in get(self, timeout)
--> 657             raise self._value

TypeError: make_lag() takes 2 positional arguments but 5 were given

I know cause of this mismatch (owing to unpacking the list, [1, 3, 7, 13, 16] , that's 5).我知道这种不匹配的原因(由于解包列表[1, 3, 7, 13, 16] ,即 5)。 How to do properly?怎么做才合适? If possible, I want to fit this list within constraint of positional arguments.如果可能的话,我想把这个列表放在位置 arguments 的约束内。 If it is almost impossible ( list or Pool.map ), what is more appropriate, easy and flexible way?如果几乎不可能( listPool.map ),那么更合适、简单和灵活的方法是什么?

Use pool.starmap .使用pool.starmap You generate a list of tuples for the arguments to your function.您为 arguments 生成一个元组列表到您的 function。 Here, it looks like df is the same each time and arg is each element in arguments.在这里,看起来 df 每次都相同,而 arg 是 arguments 中的每个元素。

arglist = [(df, arg) for arg in arguments]
with multiprocessing.Pool(multiprocessing.cpu_count()) as p:
    results = p.starmap(make_lag, arglist)

Solved.解决了。 I re-wrote in following way.我按照以下方式重新编写。

▼Functions ▼功能

def df_parallelize_run(func, arglist):    
    with Pool(psutil.cpu_count()) as p:
        # concat((lots of returned df))
        results = pd.concat(p.starmap(func, arglist), 1)
    return results
def make_lag(df, lag):
    if not isinstance(lag, list):
        lag = [lag]

    # it doesn't have to be for-loop when you use multiprocessing
    for l in lag:
        col_name = f'lag{l}d'
        df[col_name] = df.groupby(['item_id', 'store_id'])['sales'].transform(lambda x: x.shift(l))

    return df[[col_name]]

Other function其他 function

def make_lag_roll(df, lag, roll):
    col_name = f'lag{lag}_roll_mean_{roll}'
    df[col_name] = df.groupby(['id'])['target'].transform(lambda x: x.shift(lag).rolling(roll).mean())

    return df[[col_name]]

▼How to use ▼使用方法

arglist =  [(df[['id', 'target']], arg) for arg in range(1, 36)]

lag_df = df_parallelize_run(make_lag, arglist)
arglist_roll = [(df[['id', 'target']], lag, roll)
               for lag in range(1, 36)
               for roll in [7, 14, 28]]

lag_roll_df = df_parallelize_run(make_lag_roll, arglist_roll)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM