繁体   English   中英

python中的高效列表映射

[英]efficient list mapping in python

我有以下输入:

input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]

并尝试输出以下内容:

outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]

outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}

关于如何处理可读性的任何提示(var输入可能变得非常大)。

你可能想要这样的东西:

import collections
import itertools

def build_catalog(L):
    counter = itertools.count().next
    names = collections.defaultdict(counter)
    result = []
    for t in L:
        new_t = [ names[item] for item in t ]
        result.append(new_t)
    catalog = dict((name, idx) for idx, name in names.iteritems())
    return result, catalog

使用它:

>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> outputlist, outputmapping = build_catalog(input)
>>> outputlist
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

此类将自动将对象映射到递增的整数值:

class AutoMapping(object):
    def __init__(self):
        self.map = {}
        self.objects = []

    def __getitem__(self, val):
        if val not in self.map:
            self.map[val] = len(self.objects)
            self.objects.append(val)
        return self.map[val]

示例用法,供您输入:

>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> map = AutoMapping()
>>> [[map[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> map.objects
['dog', 'cat', 'mouse', 'ruby', 'python']
>>> dict(enumerate(map.objects))
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

这是一种可能的解决方案,尽管它并不是最好的解决方案。 如果通过预先分配它们,您知道列表中的每个条目将具有多少元素,则可以稍微提高效率。

labels=[];
label2index={};
outputlist=[];
for group in input:
    current=[];
    for label in group:
       if label not in label2index:
           label2index[label]=len(labels);
           labels.append(label);
       current.append(label2index[label]);
    outputlist.append(current);

outputmapping={};
for idx, val in enumerate(labels):
    outputmapping[idx]=val;

我在我的项目中经常遇到同样的问题,所以我前一段时间就完成了这个问题:

class UniqueIdGenerator(object):
    """A dictionary-like class that can be used to assign unique integer IDs to
    names.

    Usage:

    >>> gen = UniqueIdGenerator()
    >>> gen["A"]
    0
    >>> gen["B"]
    1
    >>> gen["C"]
    2
    >>> gen["A"]      # Retrieving already existing ID
    0
    >>> len(gen)      # Number of already used IDs
    3
    """

    def __init__(self, id_generator=None):
        """Creates a new unique ID generator. `id_generator` specifies how do we
        assign new IDs to elements that do not have an ID yet. If it is `None`,
        elements will be assigned integer identifiers starting from 0. If it is
        an integer, elements will be assigned identifiers starting from the given
        integer. If it is an iterator or generator, its `next` method will be
        called every time a new ID is needed."""
        if id_generator is None:
            id_generator = 0
        if isinstance(id_generator, int):
            import itertools
            self._generator = itertools.count(id_generator)
        else:
            self._generator = id_generator
        self._ids = {}

    def __getitem__(self, item):
        """Retrieves the ID corresponding to `item`. Generates a new ID for `item`
        if it is the first time we request an ID for it."""
        try:
            return self._ids[item]
        except KeyError:
            self._ids[item] = self._generator.next()
            return self._ids[item]

    def __len__(self):
        """Retrieves the number of added elements in this UniqueIDGenerator"""
        return len(self._ids)

    def reverse_dict(self):
        """Returns the reversed mapping, i.e., the one that maps generated IDs to their
        corresponding items"""
        return dict((v, k) for k, v in self._ids.iteritems())

    def values(self):
        """Returns the list of items added so far. Items are ordered according to
        the standard sorting order of their keys, so the values will be exactly
        in the same order they were added if the ID generator generates IDs in
        ascending order. This hold, for instance, to numeric ID generators that
        assign integers starting from a given number."""
        return sorted(self._ids.keys(), key = self._ids.__getitem__)

用法示例:

>>> input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
>>> gen = UniqueIdGenerator()
>>> outputlist = [[gen[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> print outputlist
>>> outputmapping = gen.reverse_dict()
>>> print outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM