[英]Determining the validity of a multi-hot encoding using bit manipulation
假设我有N
个项目和一个二进制数,表示结果中包含这些项目:
N = 4
# items 1 and 3 will be included in the result
vector = 0b0101
# item 2 will be included in the result
vector = 0b0010
我还提供了一个列表冲突,它指示哪些项目不能同时包含在结果中:
conflicts = [
0b0110, # any result that contains items 1 AND 2 is invalid
0b0111, # any result that contains AT LEAST 2 items from {1, 2, 3} is invalid
]
鉴于此冲突列表,我们可以确定较早vector
s 的有效性:
# invalid as it triggers conflict 1: [0, 1, 1, 1]
vector = 0b0101
# valid as it triggers no conflicts
vector = 0b0010
在这种情况下,如何使用位操作来确定一个向量或大量向量与冲突列表的有效性?
此处提供的解决方案已经帮助我们完成了大部分工作,但我不确定如何使其适应 integer 用例(以避免 numpy arrays 和 numba 完全)。
N = 4
# items 1 and 3 will be included in the result
vector = 0b0101
# item 2 will be included in the result
vector = 0b0010
conflicts = [
0b0110, # any result that contains items 1 AND 2 is invalid
0b0111, # any result that contains AT LEAST 2 items from {1, 2, 3} is invalid
]
def find_conflict(vector, conflicts):
found_conflict = False
for v in conflicts:
result = vector & v # do a logical AND operation
if result != 0: # there are common elements
number_of_bits_set = bin(result).count("1") # count number of common elements
if number_of_bits_set >= 2: # check common limit for detection of invalid vectors
found_conflict = True
print(f"..Conflict between {bin(vector)} and {bin(v)}: {bin(result)}")
if found_conflict:
print(f"Conflict found for {bin(vector)}.")
else:
print(f"No conflict found for {bin(vector)}.")
# invalid as it triggers conflict 1: [0, 1, 1, 1]
vector = 0b0101
find_conflict(vector, conflicts)
# valid as it triggers no conflicts
vector = 0b0010
find_conflict(vector, conflicts)
$ python3 pythontest.py
..Conflict between 0b101 and 0b111: 0b101
Conflict found for 0b101.
No conflict found for 0b10.
$
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.