简体   繁体   English

torchtext 库中的 interleave_keys() 函数究竟做了什么?

[英]What does the interleave_keys() function in torchtext library do exactly?

You can find this function at torchtext/data/utils.py file您可以在torchtext/data/utils.py文件中找到此功能

I have given the official code with documentation below我已经给出了下面的文档的官方代码

def interleave_keys(a, b):
    """Interleave bits from two sort keys to form a joint sort key.

    Examples that are similar in both of the provided keys will have similar
    values for the key defined by this function. Useful for tasks with two
    text fields like machine translation or natural language inference.
    """
    def interleave(args):
        return ''.join([x for t in zip(*args) for x in t])
    return int(''.join(interleave(format(x, '016b') for x in (a, b))), base=2)

A more detailed explanation would be helpful to understand how it returns an integer based on how similar the given two strings are.更详细的解释将有助于理解它如何根据给定的两个字符串的相似程度返回一个整数。

And the format function used inside it is the commonly used builtin function in python而里面使用的format函数就是python中常用的内置函数

So upon breaking down the function I was able to figure out what this function is doing.因此,在分解该函数后,我能够弄清楚该函数在做什么。

format(x, '016b') This piece of code converts the integer (a and b which is actually no of words in the sentences in my case) to 16 digit binary number. format(x, '016b')这段代码将整数(a 和 b 在我的例子中实际上不是句子中的单词)转换为 16 位二进制数。

And the interleave function takes out the pairs (of the same position) of binary representations join them like this, interleave函数取出(相同位置的)二元表示对,像这样加入它们,

For easy understanding lets assume 4 digit binary for 2 and 11为了便于理解,让我们假设 2 和 11 为 4 位二进制

2's binary representation is : 0 0 1 0 2 的二进制表示为: 0 0 1 0

11's binary representation is: 1 0 1 1 11 的二进制表示为: 1 0 1 1

So the output here would be 01001101 (01,00,11,01 has been combined) which when converting to integer will give 77所以这里的输出将是 01001101(01,00,11,01 已合并),当转换为整数时将给出 77

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM