[英]fastest way to apply an async function to pandas dataframe
pandas dataframe 中有一个apply
方法允许应用一些同步功能,例如:
import numpy as np
import pandas as pd
def fun(x):
return x * 2
df = pd.DataFrame(np.arange(10), columns=['old'])
df['new'] = df['old'].apply(fun)
如果必须应用异步 function fun2
,那么做类似事情的最快方法是什么:
import asyncio
import numpy as np
import pandas as pd
async def fun2(x):
return x * 2
async def main():
df = pd.DataFrame(np.arange(10), columns=['old'])
df['new'] = 0
for i in range(len(df)):
df['new'].iloc[i] = await fun2(df['old'].iloc[i])
print(df)
asyncio.run(main())
尝试使用asyncio.gather
并在完成时覆盖整个列:
import asyncio
import numpy as np
import pandas as pd
async def fun2(x):
return x * 2
async def main():
df = pd.DataFrame(np.arange(10), columns=['old'])
df['new'] = await asyncio.gather(*[fun2(v) for v in df['old']])
print(df)
asyncio.run(main())
Output:
old new
0 0 0
1 1 2
2 2 4
3 3 6
4 4 8
5 5 10
6 6 12
7 7 14
8 8 16
9 9 18
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.