![](/img/trans.png)
[英]Faster way to apply custom function to each row in pandas dataframe?
[英]Faster way to perform a function on each row with every other row in a DataFrame?
我想对 dataframe 中的每一行执行一个操作。显而易见的方法是使用嵌套 for 循环,这预计会非常慢。
寻求有关更快更好地实现同一目标的建议?
This is dataframe where each row is a user vector, with index set as usernames. In actual there can be hundreds of usernames
import pandas as pd
df1 = pd.DataFrame({"A":[11,2,3], "B":[4,5,6], "C":[7,8,9]}, index=["U1","U2", "U3"])
Nested Loop Method
import numpy as np
def some_func(u1_vec,u2_vec):
# this could be any function using above 2 user vectors
return np.minimum(u1_vec, u2_vec).sum()/np.maximum(u1_vec, u2_vec).sum()
index_list = list(df1.index) # contains usernames
vector_cols = list(df1.columns) # contains colnames
min_max_all = {} # will be used to store the vector interaction
for index_u1 in index_list:
u1_vec = df1.loc[index_u1, vector_cols]
min_max_all[index_u1] = {}
for index_u2 in index_list:
u2_vec = df1.loc[index_u2, vector_cols]
min_max_all[index_u1][index_u2] = some_func(u1_vec, u2_vec)
Result - min_max_all
{
'U1': {'U1': 1.0, 'U2': 0.5416666666666666, 'U3': 0.5384615384615384},
'U2': {'U1': 0.5416666666666666, 'U2': 1.0, 'U3': 0.8333333333333334},
'U3': {'U1': 0.5384615384615384, 'U2': 0.8333333333333334, 'U3': 1.0}
}
我认为最好的方法是使用 numpy,并为一个目的编写一个代码。
import pandas as pd
import numpy as np
df1 = pd.DataFrame({"A":[11,2,3], "B":[4,5,6], "C":[7,8,9]}, index=["U1","U2", "U3"])
df1_np = df1.to_numpy()
x = np.minimum(df1_np[:, np.newaxis], df1_np).sum(axis=2)
y = np.maximum(df1_np[:, np.newaxis], df1_np).sum(axis=2)
print(x/y)
array([[1. , 0.54166667, 0.53846154],
[0.54166667, 1. , 0.83333333],
[0.53846154, 0.83333333, 1. ]])
在问题中制作像你这样的字典
z = x/y
{ci: {cj: z[i][j] for j, cj in enumerate(df1.columns)}
for i, ci in enumerate(df1.columns)}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.