简体   繁体   中英

How to sum all values in each column and divide each column by the summed value

Basically I have a 10000x10000 matrix named M and there are 1 s and 0 s in every column. I'm trying to count the number of 1 s in every column and then divide every element in that column with this number.

This is what I have tried:

outbound_links = M[M == 1].count()

mat = [[1] * 10000] * 10000
n = 10000
#len(mat)

# for each column
for col_index in range(0, n):

    # count the number of 1s
    for row_index in range(0, n):
      
      if M[row_index][col_index] == 1:
            mat[row_index][col_index] = 1 / outbound_links[col_index]
    else:
            mat[row_index][col_index] = 0

print(mat)

But the code is unable to run because it seems too big a matrix. I was wondering what other alternatives I could use?

As suggested in the comments, you should use numpy for this. I think this will do:

import numpy as np

m = np.random.randint(0, 2, (4, 4))

# array([[0, 1, 1, 0],
#        [0, 1, 0, 1],
#        [0, 1, 0, 1],
#        [1, 1, 1, 0]])

m / np.sum(m, axis=0)[np.newaxis, :]

# array([[0.  , 0.25, 0.5 , 0.  ],
#        [0.  , 0.25, 0.  , 0.5 ],
#        [0.  , 0.25, 0.  , 0.5 ],
#        [1.  , 0.25, 0.5 , 0.  ]])

None numpy way. Simply iterate all columns, for each find the amount of ones and then divide each cell with that count:

from random import randint

n = 4
mat = [[randint(0,1) for _ in range(n)] for _ in range(n)]

print(*mat, sep='\n')

for col in range(n):
    # count the number of 1s
    ones = sum(mat[row][col] for row in range(n))

    if ones:  # Avoid dividing by zero
        for row in range(n):
            mat[row][col] /= ones

print('\n', *mat, sep='\n')

An example run:

[1, 0, 0, 1]
[0, 1, 1, 0]
[0, 0, 0, 1]
[1, 1, 1, 1]


[0.5, 0.0, 0.0, 0.33]
[0.0, 0.5, 0.5, 0.0]
[0.0, 0.0, 0.0, 0.33]
[0.5, 0.5, 0.5, 0.33]

You can try this:

import numpy as np

mat = np.array(M)
for i in range(len(mat[0])):
    try:
        mat[:,i] = mat[i,:]/np.sum(mat[:,i])
    except:
        print("no ones in that column")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM