简体   繁体   English

填充张量的有效逻辑

[英]Efficient logic to pad tensor

I'm trying to pad a tensor of some shape such that the total memory used by the tensor is always a multiple of 512 Eg Tensor shape 16x1x1x4 of type SI32 (Multiply by 4 to get total size)我正在尝试填充某种形状的张量,以使张量使用的总 memory 始终是 512 的倍数,例如 SI32 类型的张量形状 16x1x1x4 (乘以 4 以获得总大小)

The total elements are 16x4x1x1 = 64
Total Memory required 64x**4** = 256 (Not multiple of 512)
Padded shape would be 32x1x1x4 = 512

The below logic works for the basic shape but breaks with a shape eg 16x51x1x4 SI32 or something random say 80x240x1x1 U8 The padding logic goes like below下面的逻辑适用于基本形状,但会中断一个形状,例如16x51x1x4 SI32或随机说80x240x1x1 U8填充逻辑如下所示

from functools import reduce

DATA_TYPE_MULTIPLYER = 2 # This would change at runtime with different type e.g. 8 with U8 16 with F16 32 with SI32

ALIGNMENT = 512 #Always Constant
CHAR_BIT = 8    # Always Const for given fixed Arch

def approachOne(tensor):
    totalElements = reduce((lambda x, y: x * y), tensor)
    totalMemory = totalElements * DATA_TYPE_MULTIPLYER
    
    divisor = tensor[1] * tensor[2] * tensor[3]
    tempDimToPad = totalElements/divisor
    orgDimToPad = totalElements/divisor
    while (True):
        if ((tempDimToPad * divisor * DATA_TYPE_MULTIPLYER) % ALIGNMENT == 0):
            return int(tempDimToPad - orgDimToPad)
        tempDimToPad = tempDimToPad + 1;
    
def getPadding(tensor):
    totalElements = reduce((lambda x, y: x * y), tensor)
    totalMemory = totalElements * DATA_TYPE_MULTIPLYER
    newSize = totalMemory + (ALIGNMENT - (totalMemory % ALIGNMENT))
    newTotalElements = (newSize * CHAR_BIT) / (CHAR_BIT * DATA_TYPE_MULTIPLYER)
    
    # Any DIM can be padded, using first for now
    paddingValue = tensor[0] 
    padding =  int(((newTotalElements * paddingValue) / totalElements) - paddingValue)
    return padding
    
tensor = [11, 7, 3, 5]
print(getPadding(tensor))
print(approachOne(tensor))

tensorflow package may help here but I'm originally coding in C++ so just posting in python with a minimal working example Any help is appreciated, thanks tensorflow package 在这里可能会有所帮助,但我最初是在 C++ 中编码的,所以只是在 Z23EEEB4347BDD26BDDFC6B7 中发布,感谢任何工作示例

Approach 1 the brute force approach is to keep on incrementing across any chosen dimension by 1 and check if the totalMemory is multiple of 512. The brute force approach works but doesn't give the minimal padding and bloats the tensor方法 1蛮力方法是在任何选定的维度上继续递增 1 并检查 totalMemory 是否是 512 的倍数。蛮力方法有效,但没有提供最小填充并且使张量膨胀

Updating the conditions Initially the approach was to pad across the first dim.更新条件最初的方法是填充第一个暗淡。 Since always padding the first dimension my not be the best solution, just getting rid of this constraint因为总是填充第一个维度我不是最好的解决方案,只是摆脱这个约束

If you want the total memory to be a multiple of 512 then the number of elements in the tensor must be a multiple of 512 // DATA_TYPE_MULTIPLIER , eg 128 in your case.如果您希望总 memory 是512的倍数,则张量中的元素数必须是512 // DATA_TYPE_MULTIPLIER的倍数,例如在您的情况下为128 Whatever that number is, it will have a prime factorization of the form 2**n .无论那个数字是什么,它都会有一个2**n形式的素数分解。 The number of elements in the tensor is given by s[0]*s[1]*...*s[d-1] where s is a sequence containing the shape of the tensor and d is an integer, the number of dimensions.张量中的元素数量由s[0]*s[1]*...*s[d-1]给出,其中s是包含张量形状的序列, d是 integer,数量方面。 The product s[0]*s[1]*...*s[d-1] also has some prime factorization and it is a multiple of 2**n if and only if it contains these prime factors.乘积s[0]*s[1]*...*s[d-1]也有一些素因数分解,当且仅当它包含这些素因数时,它是2**n的倍数。 Ie the task is to pad the individual dimensions s[i] such that the resulting prime factorization of the product s[0]*s[1]*...*s[d-1] contains 2**n .即,任务是填充各个维度s[i]以使乘积s[0]*s[1]*...*s[d-1]的最终素因数分解包含2**n

If the goal is to reach a minimum possible size of the padded tensor, then one can simply iterate through all multiples of the given target number of elements to find the first one that can be satisfied by padding (increasing) the individual dimensions of the tensor (1) .如果目标是达到填充张量的最小可能大小,那么可以简单地遍历给定目标元素数量的所有倍数,以找到可以通过填充(增加)张量的各个维度来满足的第一个(1) A dimension must be increased as long as it contains at least one prime factor that is not contained in the target multiple size.只要维度至少包含一个未包含在目标倍数大小中的素数,就必须增加维度。 After all dimensions have been increased such that their prime factors are contained in the target multiple size, one can check the resulting size of the candidate shape: if it matches the target multiple size we are done;在所有维度都增加后,它们的主要因素包含在目标倍数大小中,可以检查候选形状的结果大小:如果它与目标倍数大小匹配,我们就完成了; if its prime factors are a strict subset of the target multiple prime factors, we can add the missing prime factors to any of the dimensions (eg the first);如果它的素因子是目标多个素因子的严格子集,我们可以将缺失的素因子添加到任何维度(例如第一个); otherwise, we can use the excess prime factors to store the candidate shape for a future (larger) multiplier.否则,我们可以使用多余的素因子来存储未来(更大)乘数的候选形状。 The first such future multiplier then marks an upper boundary for the iteration over all possible multipliers, ie the algorithm will terminate.然后,第一个这样的未来乘数标记所有可能乘数的迭代的上限,即算法将终止。 However, if the candidate shape (after adjusting all the dimensions) has an excess of prime factors w.r.t.但是,如果候选形状(在调整所有尺寸后)具有过多的素因子 w.r.t。 the target multiple size as well as misses some other prime factors, the only way is to iterate over all possible padded shapes with size bound by the target multiple size.目标倍数大小以及错过一些其他主要因素,唯一的方法是迭代所有可能的填充形状,其大小受目标倍数大小的限制。

The following is an example implementation:下面是一个示例实现:

from collections import Counter
import itertools as it
import math
from typing import Iterator, Sequence


def pad(shape: Sequence[int], target: int) -> tuple[int,...]:
    """Pad the given `shape` such that the total number of elements
       is a multiple of the given `target`.
    """
    size = math.prod(shape)
    if size % target == 0:
        return tuple(shape)

    target_prime_factors = get_prime_factors(target)

    solutions: dict[int, tuple[int,...]] = {}  # maps `target` multipliers to corresponding padded shapes

    for multiplier in it.count(math.ceil(size / target)):

        if multiplier in solutions:
            return solutions[multiplier]

        prime_factors = [*get_prime_factors(multiplier), *target_prime_factors]
        
        def good(x):
            return all(f in prime_factors for f in get_prime_factors(x))

        candidate = list(shape)
        for i, x in enumerate(candidate):
            while not good(x):
                x += 1
            candidate[i] = x

        if math.prod(candidate) == multiplier*target:
            return tuple(candidate)

        candidate_prime_factor_counts = Counter(f for x in candidate for f in get_prime_factors(x))
        target_prime_factor_counts = Counter(prime_factors)

        missing = target_prime_factor_counts - candidate_prime_factor_counts
        excess = candidate_prime_factor_counts - target_prime_factor_counts

        if not excess:
            return (
                candidate[0] * math.prod(k**v for k, v in missing.items()),
                *candidate[1:],
            )
        elif not missing:
            solutions[multiplier * math.prod(k**v for k, v in excess.items())] = tuple(candidate)
        else:
            for padded_shape in generate_all_padded_shapes(shape, bound=multiplier*target):
                padded_size = math.prod(padded_shape)
                if padded_size == multiplier*target:
                    return padded_shape
                elif padded_size % target == 0:
                    solutions[padded_size // target] = padded_shape


def generate_all_padded_shapes(shape: Sequence[int], *, bound: int) -> Iterator[tuple[int,...]]:
    head, *tail = shape
    if bound % head == 0:
        max_value = bound // math.prod(tail)
    else:
        max_value = math.floor(bound / math.prod(tail))
    for x in range(head, max_value+1):
        if tail:
            yield from ((x, *other) for other in generate_all_padded_shapes(tail, bound=math.floor(bound/x)))
        else:
            yield (x,)


def get_prime_factors(n: int) -> list[int]:
    """From: https://stackoverflow.com/a/16996439/3767239
       Replace with your favorite prime factorization method.
    """
    primfac = []
    d = 2
    while d*d <= n:
        while (n % d) == 0:
            primfac.append(d)  # supposing you want multiple factors repeated
            n //= d
        d += 1
    if n > 1:
       primfac.append(n)
    return primfac

Here are a few examples:这里有一些例子:

pad((16, 1, 1), 128) = (128, 1, 1)
pad((16, 51, 1, 4), 128) = (16, 52, 1, 4)
pad((80, 240, 1, 1), 128) = (80, 240, 1, 1)
pad((3, 5, 7, 11), 128) = (3, 5, 8, 16)
pad((3, 3, 3, 1), 128) = (8, 4, 4, 1)
pad((7, 7, 7, 7), 128) = (7, 8, 8, 8)
pad((9, 9, 9, 9), 128) = (10, 10, 10, 16)

Footnotes: (1) In fact, we need to find the roots of the polynomial (s[0]+x[0])*(s[1]+x[1])*...*(s[d-1]+x[d-1]) - multiple*target for x[i] >= 0 over the domain of integers.脚注: (1) 其实我们需要求多项式的根(s[0]+x[0])*(s[1]+x[1])*...*(s[d-1]+x[d-1]) - multiple*target 1]+x[d-1]) - 整数域上x[i] >= 0(s[0]+x[0])*(s[1]+x[1])*...*(s[d-1]+x[d-1]) - multiple*target However, I am not aware of any algorithm so solve this problem.但是,我不知道任何算法,所以解决这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM