繁体   English   中英

在 DataFrame 中每隔一行执行 function 的更快方法?

[英]Faster way to perform a function on each row with every other row in a DataFrame?

我想对 dataframe 中的每一行执行一个操作。显而易见的方法是使用嵌套 for 循环,这预计会非常慢。

寻求有关更快更好地实现同一目标的建议?

This is dataframe where each row is a user vector, with index set as usernames. In actual there can be hundreds of usernames

import pandas as pd
df1 = pd.DataFrame({"A":[11,2,3], "B":[4,5,6], "C":[7,8,9]}, index=["U1","U2", "U3"])

Nested Loop Method

import numpy as np
def some_func(u1_vec,u2_vec):
    # this could be any function using above 2 user vectors
    return np.minimum(u1_vec, u2_vec).sum()/np.maximum(u1_vec, u2_vec).sum()


index_list = list(df1.index) # contains usernames
vector_cols = list(df1.columns) # contains colnames

min_max_all = {} # will be used to store the vector interaction 
for index_u1 in index_list:
    u1_vec = df1.loc[index_u1, vector_cols]
    min_max_all[index_u1] = {}
    for index_u2 in index_list:
        u2_vec = df1.loc[index_u2, vector_cols]
        min_max_all[index_u1][index_u2] = some_func(u1_vec, u2_vec)

Result - min_max_all

{
'U1': {'U1': 1.0, 'U2': 0.5416666666666666, 'U3': 0.5384615384615384},
'U2': {'U1': 0.5416666666666666, 'U2': 1.0, 'U3': 0.8333333333333334},
'U3': {'U1': 0.5384615384615384, 'U2': 0.8333333333333334, 'U3': 1.0}
}

我认为最好的方法是使用 numpy,并为一个目的编写一个代码。

import pandas as pd
import numpy as np

df1 = pd.DataFrame({"A":[11,2,3], "B":[4,5,6], "C":[7,8,9]}, index=["U1","U2", "U3"])
df1_np = df1.to_numpy()

x = np.minimum(df1_np[:, np.newaxis], df1_np).sum(axis=2)
y = np.maximum(df1_np[:, np.newaxis], df1_np).sum(axis=2)

print(x/y)
array([[1.        , 0.54166667, 0.53846154],
       [0.54166667, 1.        , 0.83333333],
       [0.53846154, 0.83333333, 1.        ]])

在问题中制作像你这样的字典

z = x/y
{ci: {cj: z[i][j] for j, cj in enumerate(df1.columns)} 
    for i, ci in enumerate(df1.columns)}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM