简体   繁体   English

计算相对于分配给它的组或其他组的分数

[英]calculate the score with respect to the group it was assigned or other

I'm beginner in python, I have two dataframes as below.我是 python 的初学者,我有两个数据框如下。 The first dataframe represents the user with their vectors and group number.第一个 dataframe 代表用户及其向量和组号。

    df1 = pd.DataFrame({'user': ['user 1', 'user 2', 'user 3', 'user 4', 'user 5'], 'x1': [[0.2, 0.3, 0.5],[0.3, 0.3, 0.4],[0.4, 0.4, 0.2],[0.2, 0.1, 0.7],[0.5,0.3,0.2]],'group': [1, 0, 0, 2, 1]})

df1

output: output:

 user   x1  group
    0   user 1  [0.2, 0.3, 0.5] 1
    1   user 2  [0.3, 0.3, 0.4] 0
    2   user 3  [0.4, 0.4, 0.2] 0
    3   user 4  [0.2, 0.1, 0.7] 2
    4   user 5  [0.5, 0.3, 0.2] 1

the second dataframe represents the group number with its vector and variable (p2) and its threshold第二个 dataframe 表示组号及其向量和变量 (p2) 及其阈值

df2 = pd.DataFrame({'group': [0, 1, 2],
                   'x2': [[0.4, 0.2, 0.4],[0.5, 0.1, 0.4], [0.5, 0.1, 0.4]],
                   'p2': [0.231, 0.342, 0.411],
                   'threshold': [0.9, 0.6, 0.8]})
df2

output: output:

group   x2  p2  threshold
0   0   [0.4, 0.2, 0.4] 0.231   0.9
1   1   [0.5, 0.1, 0.4] 0.342   0.6
2   2   [0.5, 0.1, 0.4] 0.411   0.8

I am trying to calculate for each user, the score (S) with respect to the group it was assigned by using:我正在尝试为每个用户计算相对于分配给它的组的分数(S):

在此处输入图像描述

where k= group size and T is the transport matrix of (x2 -x1).其中 k= 组大小,T 是 (x2 -x1) 的传输矩阵。

Then check:然后检查:

1- If the score is below the threshold of its group, the user does not change its group. 1- 如果分数低于其组的阈值,则用户不会更改其组。

2- If the score is higher than the threshold then we calculate the score for the other groups and will assign the user to the group for each the score is below the threshold. 2- 如果分数高于阈值,那么我们计算其他组的分数,并将用户分配到每个分数低于阈值的组。 In case this is true for more than one group we will assign the user to the group for which the score is lowest.如果不止一个组出现这种情况,我们会将用户分配到得分最低的组。

3- If the score is above the threshold for all the groups then this user will give the start of a new group. 3- 如果分数高于所有组的阈值,则该用户将开始新组。

For example, for user 1 that belongs to group 1.例如,对于属于组 1 的用户 1。

x2 =(0.5, 0.1, 0.4)
x1 =(0.2, 0.3, 0.5)
So x2 -x1= (0.3, -0.2, -0.1)

then transport of this vector is那么这个向量的传输是

(0.3,
-0.2,
-0.1)

so multiplying the transport of this vector by (x2 -x1) is equal to:因此,将该向量的传输乘以 (x2 -x1) 等于:

 (0.9+ 0.4+0.1)= 0.14

K= 2
p2 of its cluster=0.342

The score (S) for user 1:用户 1 的分数 (S):

= 1/2+(0.14/0.342)= 0.5+( 0.4093)= 0.90

We can see that score of user 1 is higher than its group threshold (0.6) So, we need to calculate the score for the other groups (0 and 2) and will assign the user to the group for each score below the threshold.我们可以看到用户 1 的分数高于其组阈值 (0.6),因此,我们需要计算其他组(0 和 2)的分数,并将每个低于阈值的分数分配给该用户。 How could I do that for all users?我怎么能为所有用户做到这一点?

First, count up the members of each group to get the k term:首先,统计每个组的成员,得到第k项:

df2['count'] = df1.groupby('group')['user'].count()

Then merge df1 and df2 so that we have a frame with all necessary parameters for each user in a single row:然后合并df1df2 ,这样我们就有了一个框架,其中包含一行中每个用户的所有必要参数:

joined = df1.join(df2[['x2', 'p2', 'threshold', 'count']], on='group')
print(joined)

>>>      user               x1  group               x2     p2  threshold  count
0  user 1  [0.2, 0.3, 0.5]      1  [0.5, 0.1, 0.4]  0.342        0.6      2
1  user 2  [0.3, 0.3, 0.4]      0  [0.4, 0.2, 0.4]  0.231        0.9      2
2  user 3  [0.4, 0.4, 0.2]      0  [0.4, 0.2, 0.4]  0.231        0.9      2
3  user 4  [0.2, 0.1, 0.7]      2  [0.5, 0.1, 0.4]  0.411        0.8      1
4  user 5  [0.5, 0.3, 0.2]      1  [0.5, 0.1, 0.4]  0.342        0.6      2

Now define functions to calculate the S score:现在定义函数来计算 S 分数:

def l_delta(z1, z2):
    return [a1 - a2 for (a1, a2) in zip(z1, z2)]

def inner(z1, z2):
    return sum([a1 * a2 for (a1, a2) in zip(z1, z2)])

def s_score(row):
    delta = l_delta(row['x2'], row['x1'])
    num = inner(delta, delta)
    return 1/row['count'] + num / row['p2'] 

Finally, apply these functions to each row in the joined matrix:最后,将这些函数应用于连接矩阵中的每一行:

joined['s_score'] = joined.apply(s_score, axis=1)
print(joined[['user', 's_score']])

Result:结果:

     user   s_score
0  user 1  0.909357
1  user 2  0.586580
2  user 3  0.846320
3  user 4  1.437956
4  user 5  0.733918

Similar answer to @The Photon, where we (1) merge df1 and df2, (2) calculate k with groupby (3) calculate (x2-x1) inner product with itself与@The Photon 类似的答案,我们 (1) 合并 df1 和 df2,(2) 用 groupby 计算k (3) 用自身计算 (x2-x1) 内积

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'user': ['user 1', 'user 2', 'user 3', 'user 4', 'user 5'],
                    'x1': [[0.2, 0.3, 0.5],[0.3, 0.3, 0.4],[0.4, 0.4, 0.2],[0.2, 0.1, 0.7],[0.5,0.3,0.2]],
                    'group': [1, 0, 0, 2, 1]})

df2 = pd.DataFrame({'group': [0, 1, 2],
                    'x2': [[0.4, 0.2, 0.4],[0.5, 0.1, 0.4], [0.5, 0.1, 0.4]],
                    'p2': [0.231, 0.342, 0.411],
                    'threshold': [0.9, 0.6, 0.8]})

#merge df1 and df2 into a single table
merged_df = df1.merge(df2)

#calculate the number of unique users per group (k)
merged_df['k'] = merged_df.groupby('group')['user'].transform('nunique')

#calculate x2-x1 for each user (convert to numpy array for vectorized subtraction)
x2_sub_x1 = merged_df['x2'].apply(np.array)-merged_df['x1'].apply(np.array)

#calculate (x2-x1)T(x2-x1) for each user (same as squaring each term and summing)
numerator = x2_sub_x1.pow(2).apply(sum)

#calculate S from your formula and add it as a column to the merged table
merged_df['S'] = (1/merged_df['k'])+(numerator/merged_df['p2'])

Final merged table最终合并表

    user    x1  group   x2  p2  threshold   k   S
0   user 1  [0.2, 0.3, 0.5] 1   [0.5, 0.1, 0.4] 0.342   0.6 2   0.909357
1   user 5  [0.5, 0.3, 0.2] 1   [0.5, 0.1, 0.4] 0.342   0.6 2   0.733918
2   user 2  [0.3, 0.3, 0.4] 0   [0.4, 0.2, 0.4] 0.231   0.9 2   0.586580
3   user 3  [0.4, 0.4, 0.2] 0   [0.4, 0.2, 0.4] 0.231   0.9 2   0.846320
4   user 4  [0.2, 0.1, 0.7] 2   [0.5, 0.1, 0.4] 0.411   0.8 1   1.437956

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM