简体   繁体   English

一种计算每列或一行非零元素平均值的有效方法

[英]An efficient way to calculate the mean of each column or row of non-zero elements

I have a numpy array for ratings given by users on movies. 我有一个numpy数组用于电影用户给出的评级。 The rating is between 1 and 5, while 0 means that a user does not rate on a movie. 评级介于1和5之间,而0表示用户不对电影评分。 I want to calculate the average rating of each movie, and the average rating of each user. 我想计算每部电影的平均评分,以及每个用户的平均评分。 In other words, I will calculate the mean of each column or row of non-zero elements. 换句话说,我将计算每列或一行非零元素的平均值。

Is there an efficient numpy array function to handle this case? 是否有一个高效的numpy数组函数来处理这种情况? I know manually iterating ratings by columns or rows can solve the problem. 我知道按列或行手动迭代评级可以解决问题。

Thanks in advance! 提前致谢!

Since the values to discard are 0, you can compute the mean manually by doing the sum along an axis and then dividing by the number of non zeros elements (along the same axis): 由于要丢弃的值为0,您可以通过沿轴进行求和然后除以非零元素的数量(沿同一轴)手动计算平均值:

a = np.array([[8.,9,7,0], [0,0,5,6]])
a.sum(1)/(a != 0).sum(1)

results in: 结果是:

array([ 8. ,  5.5])

as you can see, the zeros are not considered in the mean. 正如您所看到的,零不被视为平均值。

You could make use of np.nanmean , after converting all 0 values to np.nan . 在将所有0值转换为np.nanmean之后,您可以使用np.nan Note that np.nanmean is only available in numpy 1.8 . 请注意, np.nanmean仅适用于numpy 1.8

import numpy as np

ratings = np.array([[1,4,5,0],
                    [2,0,3,0],
                    [4,0,0,0]], dtype=np.float)


def get_means(ratings):
    ratings[np.where(ratings == 0)] = np.nan

    user_means = np.nanmean(ratings, axis=1)
    movie_means = np.nanmean(ratings, axis=0)

    return {'user_means' : user_means, 'movie_means' : movie_means}

Result: 结果:

>>> get_means(ratings)
{'movie_means': array([ 2.33333333,  4.        ,  4.        ,         nan]), 

'user_means': array([ 3.33333333,  2.5       ,  4.        ])}

Another alternative is to use a masked array, with the 0 values masked. 另一种方法是使用屏蔽数组,屏蔽0值。 For example (using @Akavali's sample data): 例如(使用@ Akavali的示例数据):

In [30]: ratings = np.array([[1,4,5,0],
   ....:                     [2,0,3,0],
   ....:                     [4,0,0,0]], dtype=np.float)

Create the masked version of ratings , using ratings==0 as the mask: 使用ratings==0作为掩码创建蒙版的ratings

In [31]: mratings = np.ma.masked_array(ratings, mask=ratings==0)

In [32]: mratings
Out[32]: 
masked_array(data =
 [[1.0 4.0 5.0 --]
 [2.0 -- 3.0 --]
 [4.0 -- -- --]],
             mask =
 [[False False False  True]
 [False  True False  True]
 [False  True  True  True]],
       fill_value = 1e+20)

Now compute the mean along each axis: 现在计算每个轴的平均值:

In [33]: mratings.mean(axis=0)
Out[33]: 
masked_array(data = [2.3333333333333335 4.0 4.0 --],
             mask = [False False False  True],
       fill_value = 1e+20)

In [34]: mratings.mean(axis=1)
Out[34]: 
masked_array(data = [3.3333333333333335 2.5 4.0],
             mask = [False False False],
       fill_value = 1e+20)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算2D NumPy数组中每行和每列内的非零元素 - Counting non-zero elements within each row and within each column of a 2D NumPy array 在scipy.sparse矩阵中访问行/列中非零值的最有效方法 - Most efficient way of accessing non-zero values in row/column in scipy.sparse matrix 改组数组中每一行的非零元素-Python / NumPy - Shuffling non-zero elements of each row in an array - Python / NumPy 计算每个熊猫列中的非空/非零行条目 - Counting non-empty / non-zero row entries in each pandas column 将每行的最后一个非零元素设置为零 - NumPy - Set last non-zero element of each row to zero - NumPy NumPy - 在 nd 数组的每一列中查找和打印非零元素 - NumPy - Finding and printing non-zero elements in each column of a n-d array 在具有非零值的字典中查找最大键的有效方法 - Efficient way to find the largest key in a dictionary with non-zero value 在 pandas 数据框中的每一行中找到非零值的列索引集 - find the set of column indices for non-zero values in each row in pandas' data frame 从scipy稀疏矩阵的每一行中有效地选择随机非零列 - Efficiently select random non-zero column from each row of sparse matrix in scipy pandas 确定对每一行中的非零值有贡献的列标签 - pandas determine column labels that contribute to non-zero values in each row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM