简体   繁体   中英

Why is it faster sending data as encoded string than sending it as bytes?

I'm playing with pyzmq for inter-process transfer of 4k HDR image data and noticed that this:

byt = np.array2string(np.random.randn(3840,2160,3)).encode()
while True:
   socket.send(byt)

is much much faster than:

byt = np.random.randn(3840,2160,3).asbytes()
while True:
   socket.send(byt)

Can someone explain why? I can't seem to wrap my head around it.

Q: Why is it faster sending... ? Can someone explain why ?

A:
+1 for having asked WHY -
people who do understand WHY are those, that strive to learn to the roots of the problems, so as to truly understand the core reasons & thus can next design better systems, knowing the very WHY ( taking no shortcuts in mimicking emulating or copy/paste following someone else )

So, let's start:

在此处输入图像描述 HDR is not the SDR,
we will have "a lot of DATA" here to acquire - store - process - send,
在此处输入图像描述


Inventory of facts
- in this order: DATA, process, .send() , who gets faster & WHY

DATA:
were defined to be 4K-HDR sized array of triple-data-values of a numpy provided default dtype , where ITU-T Recommendation BT-2100 HDR colourspace requires at least 10-bit for increased colour dynamics-ranges

The as-is code delivers numpy.random.randn( c4K, r4K, 3 ) 's default dtype of np.float64 . Just for the sake of proper & right-sized system design, the HDR ( extending a plain 8-bit sRGB triple-byte colourspace ) shall always prefer int{10|12|16|32|...} -based storage, not to skew any numerical image post-processing in pipeline's later phase(s).

process:
Actual message-payload generating processes were defined to be
Case-A ) np.array2string( ) followed by an .encode() method

Case-B ) a numpy.ndarray -native (sic) .asbytes() -method

.send() :
ZeroMQ Scalable Formal Communication Archetype pattern (of unknown type) finally receives a process-generated message-payload, into a ( blocking -form of the ) .send() -method


Solution of WHY & tips for HOW:

The core difference is hidden in the fact, that we try to compare apples to oranges.

>>> len(                  np.random.randn( c4K, r4K, 3 ).tobytes() ) / 1E6
199.0656 [MB]

>>> len( np.array2string( np.random.randn( c4K, r4K, 3 ) )         ) / 1E6
0.001493 [MB] ... Q.E.D.

While the (sic) .asbytes() -method produces a full copy ( incl. RAM-allocation + RAM-I/O-traffic [SPACE] + [TIME] -domains' costs ), ie spending some extra us before ZeroMQ starts a .send() -method ZeroCopy magicks:

 print( np.random.randn( c4K, r4K, 3 ).tobytes.__doc__ ) a.tobytes(order='C') Construct Python bytes containing the raw data bytes in the array. Constructs Python bytes showing a copy of the raw contents of data memory. The bytes object is produced in C-order by default. This behavior is controlled by the ``order`` parameter. .. versionadded:: 1.9.0


the other case, the Case-A , first throws away (!) , and a lot (!) ... depending here on actual numpy matrix-UI-presentation configuration settings , lot of original 4K-HDR DATA even before moving them into the .encode() -phase:

 >>> print( np.array2string( np.random.randn( c4K, r4K, 3 ) ) ) [[[ 1.54482944 -0.23189048 -0.67866246]... [ 0.13461456 1.47855833 -1.68885902]] [[-0.18963557 -1.1869201 1.34843493]... [-0.3022641 -0.44158803 0.75750368]] [[-1.05737969 0.864752 0.36359686]... [ 1.70240612 -0.12574642 -1.03325878]]... [[ 0.41776933 1.73473723 0.28723299]... [-0.47635911 0.15901325 -0.56407537]] [[-1.41571874 1.66735309 0.6259928 ]... [-0.93164127 0.95708002 1.3470873 ]] [[ 0.16426176 -0.00317156 0.77522962]... [ 0.32960196 -1.74369368 -0.34177759]]]

So, sending less-DATA means taking less time to move them.

Tips HOW:

  1. ZeroMQ methods & the overall performance will benefit from using zmq.DONTWAIT flag, when passing a reference to the .send() -method

  2. try to re-use the most of the great numpy -tooling, where possible, to minimise repetitive RAM-allocation(s) ( we may pre-allocate & re-use once allocated variable )

  3. try to use as compact DATA-representation as possible, if hunting for maximum performance with minimum latency - redundancy-avoided, compact, CPU-cache-lines' hierarchy & associativity matching formats will always win in the race for ultimate performance ( using a view of internal numpy -storage area, ie without using any mediating methods to read-access the actual block of 4K-HDR data may help to move the whole pipeline to become ZeroCopy down to the ZeroMQ .send() -pushing the DATA-references only ( ie without copying or moving a single byte of DATA from / into RAM, up until loading it onto the wire... )... which is the coolest performance result of our design efforts here, isn't it? )

  4. in any case, in all critical sections, avoid effects of blocking the flow by gc.disable() , to at least defer a potential .collect() not to happen "here"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM