I have a simple problem: multiply a matrix by a vector. However, the implementation of the multiplication is complicated because the matrix is 18 gb (3000^2 by 500).
Some info:
Right now I have a single-thread implementation where the python code reads in the matrix a few thousand lines at a time and performs the multiplication. However, this is a significant bottleneck for my program since it takes so darn long. I'd like to multithread it to speed it up a bit.
I'm trying to get an idea of whether it would be faster (computation-time-wise, not implementation time) for python to handle the multithreading and to continue to use numpy operations to do the multiplication, or to code an MV multiplication function with multithreading in C and bind it with ctypes.
I will likely do both and time them since shaving time off of an extremely long running program is important. I was wondering if anyone had encountered this situation before, though, and had any insight (or perhaps other suggestions?)
As a side question, I can only find algorithmic improvements for nxn matrices for mv multiplication. Does anyone know of one that can be used on an mxn matrix?
As Sven Marnach wrote in the comments, your problem is most likely I/O bound since disk access is orders of magnitude slower than RAM access.
So the fastest way is probably to have a machine with enough memory to keep the whole matrix multiplication and the result in RAM. It would save lots of time if you read the matrix only once.
Replacing the harddisk with an SSD would also help, because that can read and write a lot faster.
Barring that, for speeding up reads from disk, you could use the mmap
module. This should help, especially once the OS figures out you're reading pieces of the same file over and over and starts to keep it in the cache.
Since the calculation can be done by row, you might benefit from using numpy
in combination with a multiprocessing.Pool
for that calculation. But only really if a single process cannot use all available disk read bandwith.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.