Unexpected results with bit operations

Question

Alright, I need to do some bit operations on a list of integers. The lists can be very long (256-4096 integers). The catch: I need to read the bits in groups of a variable length (eg 7 bits until I hit the end.

I figured out that I have 2 options:

Convert every integer to bytes of the length 8. Concatenate all bytes. Then iterate through chunks of n bytes, where n is the lowest common multiple of the variable bitlength and 8 bits. Do this until I hit the end.
Example 1: 7-bits (variable) & 8-bits (a byte) = 56 bits = 8 bytes.
Example 2: 3-bits (variable) & 8-bits (a byte) = 24 bits = 3 bytes.
Skip conversion to bytes. Iterate through the list, shift a new temporary integer 64-bits to the left and insert the next integer into it with a bitwise OR-operator. This creates a massive integer, which I can iterate and use a bitwise AND-operator to extract the variable-length bits. I read that in Python 3 integers are infinite, so overflows can't happen.

I need to take the fasted approach, so I started to write both scripts to time them, but noticed that the second approach returns completely unexpected results.

In the following script blocks, you find four examples.

Examples 1 shows the conversion part, like explained in option 1 above. The difference between 1a and 1b is, that I reversed the long_list so I have a negative integer at the beginning.

Example 2 shows the bit shifting necessary for option 2. The difference between 2a and 2b is the same as between 1a and 1b. What is weird, is that the bit representation of 2b is like 1b (what I would expect). However, the integer is different from 1b.

So why are the results different?

import numpy as np

long_list = [145249953336295681, -4503032244740276095]
long_list_reversed = [-4503032244740276095, 145249953336295681]

############ EXAMPLE 1a ###########

# The following code creates exactly what I am expecting:
trueresult = long_list[0].to_bytes(8, "big", signed=True) + long_list[1].to_bytes(8, "big", signed=True)
trueint = int.from_bytes(trueresult, "big")
truebitstring = np.binary_repr(int.from_bytes(trueresult, "big"), width=128)
print("Int", trueint)  # output as expected
print("Bits", truebitstring)  # output as expected
assert trueint & 0b11111111 == 0b10000001  # True as expected

############ EXAMPLE 1b ###########

# The following code creates exactly what I am expecting:
# The same as the above, but the integers are switched, so the first 64 bits appear last.
trueresult_reversed = long_list_reversed[0].to_bytes(8, "big", signed=True) + long_list_reversed[1].to_bytes(8, "big", signed=True)
trueint_reversed = int.from_bytes(trueresult_reversed, "big")
truebitstring_reversed = np.binary_repr(int.from_bytes(trueresult_reversed, "big"), width=128)
print("Int", trueint_reversed)  # output as expected
print("Bits", truebitstring_reversed)  # output as expected
assert trueint_reversed & 0b11111111 == 0b00000001  # True as expected

assert truebitstring == truebitstring_reversed[64:] + truebitstring_reversed[:64]  # True as expected

############ EXAMPLE 2a ###########

# The following code creates completely unexpected output. Should do the same as the first code block.
shiftint = long_list[0] << 64 | long_list[1]
shiftbitstring = np.binary_repr(shiftint, width=128)
print("Int", shiftint)  # output unexpected. should be same as 'trueint'
print("Bits", shiftbitstring)  # output unexpected, should be same as 'truebitstring'

############ EXAMPLE 2b ###########

# The following code creates completely unexpected output. Should do the same as the second code block.
# On top of that, it doesn't even compare to the third, like the second to the first (swapped integers).
shiftint_reversed = long_list_reversed[0] << 64 | long_list_reversed[1]
shiftbitstring_reversed = np.binary_repr(shiftint_reversed , width=128)
print("Int", shiftint_reversed)  # output both unexpected. should be same as 'trueint_reversed'
print("Bits", shiftbitstring_reversed)  # output expected, same as Example 1b! However, the integer above is NOT like in 1b!

This is the output of the script:

Int 2679388715912901282319653733876646017
Bits 00000010000001000000100000010000001000000100000010000001000000011100000110000010000001000000100000010000001000000100000010000001
Int 257216083546552756177539452046611087617
Bits 11000001100000100000010000001000000100000010000001000000100000010000001000000100000010000001000000100000010000001000000100000001
Int -4503032244740276095
Bits 11111111111111111111111111111111111111111111111111111111111111111100000110000010000001000000100000010000001000000100000010000001
Int -83066283374385707285835155385157123839
Bits 11000001100000100000010000001000000100000010000001000000100000010000001000000100000010000001000000100000010000001000000100000001

If I were to try to make a visual representation of what I am trying to achieve with the bit-shifting, it would look like this. Imagine the lines with a colon as the integer in memory. The rest are operations on the memory.

short_list = 00000010, 10000000

: 0
  << 8
: 00000000
  | 00000010
: 00000010
  << 8
: 00000010 00000000
  | 10000000
: 00000010 10000000

Answer 1

Your 2a/2b code snippets can't decide whether they think Python ints are 64-bit or arbitrary-precision. You're shifting numbers by 64 bits, as if the next number to OR in is exactly 64 bits, but it's not.

Python ints simulate an infinite-bit two's complement representation, and in infinite-bit two's complement, -6 is

...11111111111111111111111111111111111111111111111111111111111111111111111111111111111010

with an infinite trail of leading 1 s going off to the left. It's kind of like the 2-adic integers . This infinite trail of 1 s is responsible for the big block of 1 s you see in the third bit string.

So, again, Python ints are conceptually infinite-bit, but the representations you get with int.to_bytes and numpy.binary_repr are not. That's why those functions take width arguments.

some_int.to_bytes(8, 'big', signed=True) produces a 64-bit (8-byte) two's complement representation of an int. Since your to_bytes call produces a 64-bit bytestring, the bytestring concatenation produces the results you expected.

numpy.binary_repr(some_int, width=128) produces a 128-bit representation of an int. For a negative input, it uses two's complement, but it will also have no problem with producing an output with a leading 1 for a positive input, even if that 1 would cause the output to be treated as negative in two's complement.

Unexpected results with bit operations

Question

1 answers

solution1
2 2018-10-10 19:20:08

Unexpected results with bit operations

Question

1 answers

solution1 2 2018-10-10 19:20:08

solution1
2 2018-10-10 19:20:08