繁体   English   中英

将异步 function 应用到 pandas dataframe 的最快方法

[英]fastest way to apply an async function to pandas dataframe

pandas dataframe 中有一个apply方法允许应用一些同步功能,例如:

import numpy as np
import pandas as pd

def fun(x):
    return x * 2

df = pd.DataFrame(np.arange(10), columns=['old'])

df['new'] = df['old'].apply(fun)

如果必须应用异步 function fun2 ,那么做类似事情的最快方法是什么:

import asyncio
import numpy as np
import pandas as pd

async def fun2(x):
    return x * 2

async def main():
    df = pd.DataFrame(np.arange(10), columns=['old'])
    df['new'] = 0    
    for i in range(len(df)):
        df['new'].iloc[i] = await fun2(df['old'].iloc[i])
    print(df)

asyncio.run(main())

尝试使用asyncio.gather并在完成时覆盖整个列:

import asyncio
import numpy as np
import pandas as pd


async def fun2(x):
    return x * 2


async def main():
    df = pd.DataFrame(np.arange(10), columns=['old'])
    df['new'] = await asyncio.gather(*[fun2(v) for v in df['old']])
    print(df)


asyncio.run(main())

Output:

   old  new
0    0    0
1    1    2
2    2    4
3    3    6
4    4    8
5    5   10
6    6   12
7    7   14
8    8   16
9    9   18

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM