简体   繁体   中英

Python: Fast way to create image from a list of tuples

I am doing the following.

import numpy as np
import pylab

.....

x = np.zeros([250,200])
for tup in tups:
    x[tup[1],tup[0]] = x[tup[1],tup[0]] + 1
pylab.imshow(x)

Where

tups = [(x1,y1),(x2,y2),....]

and xi , yi are integers

This is fine for tup with a low number of points. For a large number of points ~10^6 it is taking hours.

Can you think of a faster way of doing this?

One small improvement i can easily see, instead of the next:

for tup in tups:
    x[tup[1],tup[0]] = x[tup[1],tup[0]] + 1

try doing

for tup in tups:
    x[tup[1],tup[0]] += 1

Since this overwrites the same memory-adress, instead of creating a new memory-spot to put 'old value + 1' (note: this will probably only result in a marginal speedup in this case, but if you do this same trick A+=B instead of C = A + B, in the case where A and B are numpy ndarrays of a Gb each or so, it actually is a massive speedup)

Why do you read in something as tuples? shouldnt you try to read it in as a numpy ndarray in the first place, instead of reading it in as a list of tuples and than change to a numpy array? Where do you create that big list of tuples? If that can be avoided, it will be much better, to just avoid the list of tuples, instead of creating it and than later swapping to a numpy solution?

Edit: so i just wanted to tell of this speedup that you can get by the +=, and at the same time ask why you have a big list of tuples, but thats too long to put both things in a comment

Another question: am i right in assuming your tuples can have multiple repeats? like

tups = [(1,0), (2,4), (1,0), (1,2), ..., (999, 999), (992, 999)]

so that in your endresult, other values than 0 and 1 will exist? or is your resulting array something in which only ones and zeros exist?

Using numpy you could convert your pairs of indices into a flat index and bincount it:

import numpy as np
import random

rows, cols = 250, 200
n = 1000

tups = [(random.randint(0, rows-1),
         random.randint(0, cols-1)) for _ in range(n)]

x = np.zeros((rows, cols))
for tup in tups:
    x[tup[0],tup[1]] += 1

flat_idx = np.ravel_multi_index(zip(*tups), (rows, cols))
y = np.bincount(flat_idx, minlength=rows*cols).reshape(rows, cols)

np.testing.assert_equal(x, y)

It will be much faster than any looping solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM