简体   繁体   English

字典:按多个键+降序/升序排序

[英]Dict: sort by multiple keys + descending/ascending

I have a dict with keys as str and values as np.array .我有一个字典,键为 str ,值为np.array The types of the np.array can also be of type np.str_ . np.array 的类型也可以是np.str_类型。

data = {"col1": np.array((1, 2, 3, 4, 5, 4, 3, 2, 1)),
        "col2": np.array(list("abcdeabcd")),
        "col3": np.array((10, 11, 9, 8, 7, 2, 12, 100, 1))}

How can I sort by multiple keys and ascending/descending order similar to what I could do with pandas 's sort_values method:如何按多个键和升序/降序排序,类似于使用pandassort_values方法可以执行的sort_values

Pandas solution (not needed)熊猫解决方案(不需要)

df = pd.DataFrame(data)
df.sort_values(by=["col1", "col2"], ascending=[True, True])

Numpy or base python solution needed:需要 Numpy 或基本 python 解决方案:

I don't want to use pandas but ideally something in numpy .我不想使用熊猫,但最好在numpy I know that I can use np.lexsort to sort by multiple columns.我知道我可以使用np.lexsort按多列排序。 But this does not give me (i) the option to do ascending/descending.但这并没有给我 (i) 升/降的选项。

This is the output I want when sorting by col1 and then col2 in ascending / ascending order:这是按 col1 然后按升序/升序排序 col2 时我想要的输出:

{'col1': np.array([1, 1, 2, 2, 3, 3, 4, 4, 5]),
 'col2': np.array(['a', 'd', 'b', 'c', 'b', 'c', 'a', 'd', 'e']),
 'col3': np.array([10, 1, 11, 100, 12, 9, 2, 8, 7])}

This is the output I want when sorting by col1 and then col2 in ascending / descending order:这是按升序/降序按 col1 然后 col2 排序时我想要的输出:

{'col1': np.array([1, 1, 2, 2, 3, 3, 4, 4, 5]),
 'col2': np.array(['d', 'a', 'c', 'b', 'c', 'b', 'd', 'a', 'e']),
 'col3': np.array([1, 10, 100, 11, 9, 12, 8, 2, 7])}

You can use np.lexsort to sort columns backwards by negating the sort key.您可以使用np.lexsort通过否定排序键对列进行向后排序。 Remember that you don't need to pass the actual array to lexsort .请记住,您不需要将实际数组传递给lexsort For signed numerical arrays, you can sort in reverse by negating the values.对于带符号的数值数组,您可以通过取反值来反向排序。 For unsigned integers, you can subtract the values from the maximum.对于无符号整数,您可以从最大值中减去这些值。 Strings can either be treated as numbers of you can make a lookup table of signed integers based on np.unique .字符串既可以被视为数字,也可以根据np.unique制作有符号整数的查找表。

Here is a small example of arrays:这是数组的一个小例子:

np.random.seed(0xBEEF)
a = np.random.choice([1, 2, 3], 10)
b = np.random.choice([1.0, 2.0, 3.0], 10)
c = np.random.choice(np.array([1, 2, 3], dtype=np.uint8), 10)
d = np.random.choice(list('abc'), 10)

The sort keys in ascending order can be the arrays themselves in all cases.在所有情况下,升序排序键都可以是数组本身。 In descending order, we can obviously use -a and -b .按照降序,我们显然可以使用-a-b As it happens, -c also works:碰巧, -c也有效:

>>> c
array([3, 1, 1, 3, 2, 2, 1, 1, 2, 3], dtype=uint8)
>>> -c
array([253, 255, 255, 253, 254, 254, 255, 255, 254, 253]

This may be dependent on the platform representation, but on most popular systems, negative numbers are represented in two's complement form and this should work just fine.这可能取决于平台表示,但在大多数流行的系统上,负数以二进制补码形式表示,这应该可以正常工作。 If you wanted to be really safe, you could add a check like如果你想真正安全,你可以添加一个检查

if np.issubdtype(c.dtype, np.unsignedinteger):
    key = np.iinfo(c.dtype).max + 1 - c

And of course we have d :当然,我们有d

>>> d
array(['c', 'a', 'b', 'b', 'c', 'b', 'a', 'b', 'c', 'b'], dtype='<U1')
>>> -d
...
UFuncTypeError: ufunc 'negative' did not contain a loop with signature matching types dtype('<U1') -> dtype('<U1')

One way to construct a sort key here is:在这里构造排序键的一种方法是:

lookup, key = np.unique(d, return_inverse=True)

The elements of key are indices into lookup , which is in sorted order, meaning that if you sorted key , the result of lookup[key] would be correctly sorted as well. key的元素是lookup索引,按排序顺序排列,这意味着如果对key进行排序, lookup[key]的结果也会正确排序。 This means that key.argsort() and d.argsort() are the same, with the added advantage that you can negate key .这意味着key.argsort()d.argsort()是相同的,并且具有可以否定key的附加优势。

In fact, you can take a shortcut an write your key generator using this technique alone:事实上,你可以走捷径,单独使用这种技术来编写你的密钥生成器:

def make_key(arr, asc=True):
    _, key = np.unique(arr, return_inverse=True)
    if not asc:
        key = np.negative(key, out=key) # Don't bother making a second array
    return key

So your full example could look something like this:因此,您的完整示例可能如下所示:

def custom_lexsort(arrs, asc=True):
    """
    Lexsort a collection of arrays in ascending or descending order.

    Parameters
    ----------
    arrs : sequence[array-like]
        Sequence of arrays to sort.
    asc : array-like[bool]
        Sequence of True for ascending elements of `keys`,
        False for descending. Must broadcast to `(len(arrs),)`.
    """
    def make_key(a, asc):
        if np.issubdtype(a.dtype, np.number):
            key = a
        else:
            _, key = np.unique(a, return_inverse=True)
        if asc:
            return key
        elif np.issubdtype(key.dtype, np.unsignedinteger):
            return np.iinfo(key.dtype).max + 1 - key
        else:
            return -key

    n = len(arrs)
    asc = np.broadcast_to(asc, n)
    keys = [make_key(*x) for x in zip(arrs, asc)]
    return np.lexsort(keys[::-1])
data = {"col1": np.array((1, 2, 3, 4, 5, 4, 3, 2, 1)),
        "col2": np.array(list("abcdeabcd")),
        "col3": np.array((10, 11, 9, 8, 7, 2, 12, 100, 1))}

idx = custom_lexsort(list(data.values()), [True, False, True])
result = {k: v[idx] for k, v in data.items()}

I've taken the liberty of reversing the order of the arrays, since lexsort sorts from last to first.我冒昧地颠倒了数组的顺序,因为lexsort从最后到第一个排序。 And sure enough:果然:

>>> result
{'col1': array([1, 1, 2, 2, 3, 3, 4, 4, 5]),
 'col2': array(['d', 'a', 'c', 'b', 'c', 'b', 'd', 'a', 'e'], dtype='<U1'),
 'col3': array([  1,  10, 100,  11,   9,  12,   8,   2,   7])}

I've included the third column for sorting because it does no harm.我已经包含了用于排序的第三列,因为它没有害处。 Here is an example of the arrays sorted in place, with only the first two used for sorting:这是一个就地排序的数组示例,只有前两个用于排序:

idx = custom_lexsort([data['col1'], data['col2']], [True, False])
for v in data.values():
    v[:] = v[idx]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM