填充張量的有效邏輯

Question

我正在嘗試填充某種形狀的張量，以使張量使用的總 memory 始終是 512 的倍數，例如 SI32 類型的張量形狀 16x1x1x4 （乘以 4 以獲得總大小）

The total elements are 16x4x1x1 = 64
Total Memory required 64x**4** = 256 (Not multiple of 512)
Padded shape would be 32x1x1x4 = 512

下面的邏輯適用於基本形狀，但會中斷一個形狀，例如16x51x1x4 SI32或隨機說80x240x1x1 U8填充邏輯如下所示

from functools import reduce

DATA_TYPE_MULTIPLYER = 2 # This would change at runtime with different type e.g. 8 with U8 16 with F16 32 with SI32

ALIGNMENT = 512 #Always Constant
CHAR_BIT = 8    # Always Const for given fixed Arch

def approachOne(tensor):
    totalElements = reduce((lambda x, y: x * y), tensor)
    totalMemory = totalElements * DATA_TYPE_MULTIPLYER
    
    divisor = tensor[1] * tensor[2] * tensor[3]
    tempDimToPad = totalElements/divisor
    orgDimToPad = totalElements/divisor
    while (True):
        if ((tempDimToPad * divisor * DATA_TYPE_MULTIPLYER) % ALIGNMENT == 0):
            return int(tempDimToPad - orgDimToPad)
        tempDimToPad = tempDimToPad + 1;
    
def getPadding(tensor):
    totalElements = reduce((lambda x, y: x * y), tensor)
    totalMemory = totalElements * DATA_TYPE_MULTIPLYER
    newSize = totalMemory + (ALIGNMENT - (totalMemory % ALIGNMENT))
    newTotalElements = (newSize * CHAR_BIT) / (CHAR_BIT * DATA_TYPE_MULTIPLYER)
    
    # Any DIM can be padded, using first for now
    paddingValue = tensor[0] 
    padding =  int(((newTotalElements * paddingValue) / totalElements) - paddingValue)
    return padding
    
tensor = [11, 7, 3, 5]
print(getPadding(tensor))
print(approachOne(tensor))

tensorflow package 在這里可能會有所幫助，但我最初是在 C++ 中編碼的，所以只是在 Z23EEEB4347BDD26BDDFC6B7 中發布，感謝任何工作示例

方法 1蠻力方法是在任何選定的維度上繼續遞增 1 並檢查 totalMemory 是否是 512 的倍數。蠻力方法有效，但沒有提供最小填充並且使張量膨脹

更新條件最初的方法是填充第一個暗淡。 因為總是填充第一個維度我不是最好的解決方案，只是擺脫這個約束

Answer 1

如果您希望總 memory 是512的倍數，則張量中的元素數必須是512 // DATA_TYPE_MULTIPLIER的倍數，例如在您的情況下為128 。 無論那個數字是什么，它都會有一個2**n形式的素數分解。 張量中的元素數量由s[0]*s[1]*...*s[d-1]給出，其中s是包含張量形狀的序列， d是 integer，數量方面。 乘積s[0]*s[1]*...*s[d-1]也有一些素因數分解，當且僅當它包含這些素因數時，它是2**n的倍數。 即，任務是填充各個維度s[i]以使乘積s[0]*s[1]*...*s[d-1]的最終素因數分解包含2**n 。

如果目標是達到填充張量的最小可能大小，那么可以簡單地遍歷給定目標元素數量的所有倍數，以找到可以通過填充（增加）張量的各個維度來滿足的第一個⁽¹⁾ 。 只要維度至少包含一個未包含在目標倍數大小中的素數，就必須增加維度。 在所有維度都增加后，它們的主要因素包含在目標倍數大小中，可以檢查候選形狀的結果大小：如果它與目標倍數大小匹配，我們就完成了； 如果它的素因子是目標多個素因子的嚴格子集，我們可以將缺失的素因子添加到任何維度（例如第一個）； 否則，我們可以使用多余的素因子來存儲未來（更大）乘數的候選形狀。 然后，第一個這樣的未來乘數標記所有可能乘數的迭代的上限，即算法將終止。 但是，如果候選形狀（在調整所有尺寸后）具有過多的素因子 w.r.t。 目標倍數大小以及錯過一些其他主要因素，唯一的方法是迭代所有可能的填充形狀，其大小受目標倍數大小的限制。

下面是一個示例實現：

from collections import Counter
import itertools as it
import math
from typing import Iterator, Sequence


def pad(shape: Sequence[int], target: int) -> tuple[int,...]:
    """Pad the given `shape` such that the total number of elements
       is a multiple of the given `target`.
    """
    size = math.prod(shape)
    if size % target == 0:
        return tuple(shape)

    target_prime_factors = get_prime_factors(target)

    solutions: dict[int, tuple[int,...]] = {}  # maps `target` multipliers to corresponding padded shapes

    for multiplier in it.count(math.ceil(size / target)):

        if multiplier in solutions:
            return solutions[multiplier]

        prime_factors = [*get_prime_factors(multiplier), *target_prime_factors]
        
        def good(x):
            return all(f in prime_factors for f in get_prime_factors(x))

        candidate = list(shape)
        for i, x in enumerate(candidate):
            while not good(x):
                x += 1
            candidate[i] = x

        if math.prod(candidate) == multiplier*target:
            return tuple(candidate)

        candidate_prime_factor_counts = Counter(f for x in candidate for f in get_prime_factors(x))
        target_prime_factor_counts = Counter(prime_factors)

        missing = target_prime_factor_counts - candidate_prime_factor_counts
        excess = candidate_prime_factor_counts - target_prime_factor_counts

        if not excess:
            return (
                candidate[0] * math.prod(k**v for k, v in missing.items()),
                *candidate[1:],
            )
        elif not missing:
            solutions[multiplier * math.prod(k**v for k, v in excess.items())] = tuple(candidate)
        else:
            for padded_shape in generate_all_padded_shapes(shape, bound=multiplier*target):
                padded_size = math.prod(padded_shape)
                if padded_size == multiplier*target:
                    return padded_shape
                elif padded_size % target == 0:
                    solutions[padded_size // target] = padded_shape


def generate_all_padded_shapes(shape: Sequence[int], *, bound: int) -> Iterator[tuple[int,...]]:
    head, *tail = shape
    if bound % head == 0:
        max_value = bound // math.prod(tail)
    else:
        max_value = math.floor(bound / math.prod(tail))
    for x in range(head, max_value+1):
        if tail:
            yield from ((x, *other) for other in generate_all_padded_shapes(tail, bound=math.floor(bound/x)))
        else:
            yield (x,)


def get_prime_factors(n: int) -> list[int]:
    """From: https://stackoverflow.com/a/16996439/3767239
       Replace with your favorite prime factorization method.
    """
    primfac = []
    d = 2
    while d*d <= n:
        while (n % d) == 0:
            primfac.append(d)  # supposing you want multiple factors repeated
            n //= d
        d += 1
    if n > 1:
       primfac.append(n)
    return primfac

這里有一些例子：

pad((16, 1, 1), 128) = (128, 1, 1)
pad((16, 51, 1, 4), 128) = (16, 52, 1, 4)
pad((80, 240, 1, 1), 128) = (80, 240, 1, 1)
pad((3, 5, 7, 11), 128) = (3, 5, 8, 16)
pad((3, 3, 3, 1), 128) = (8, 4, 4, 1)
pad((7, 7, 7, 7), 128) = (7, 8, 8, 8)
pad((9, 9, 9, 9), 128) = (10, 10, 10, 16)

_腳注： _{(1) 其實我們需要求多項式的根(s[0]+x[0])*(s[1]+x[1])*...*(s[d-1]+x[d-1]) - multiple*target 1]+x[d-1]) - 整數域上x[i] >= 0的(s[0]+x[0])*(s[1]+x[1])*...*(s[d-1]+x[d-1]) - multiple*target 。} _{但是，我不知道任何算法，所以解決這個問題。}

填充張量的有效邏輯

問題描述

1 個解決方案

解決方案1
0 2022-08-12 14:59:43

填充張量的有效邏輯

問題描述

1 個解決方案

解決方案1 0 2022-08-12 14:59:43

解決方案1
0 2022-08-12 14:59:43