简体   繁体   English

Python:如何从嵌套数据结构(列表和字典)中删除所有值?

[英]Python: How RECURSIVELY remove None values from a NESTED data structure (lists and dictionaries)?

Here is some nested data, that includes lists, tuples, and dictionaries: 这是一些嵌套数据,包括列表,元组和字典:

data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]

Goal: Remove any keys or values (from "data") that are None. 目标:从“数据”中删除无的任何键或值。 If a list or dictionary contains a value, that is itself a list, tuple, or dictionary, then RECURSE, to remove NESTED Nones. 如果列表或字典包含一个值,该值本身就是列表,元组或字典,则请RECURSE删除NESTED Nones。

Desired output: 所需的输出:

[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]

Or more readably, here is formatted output: 更可读的是,这里是格式化输出:

StripNones(data)= list:
. [22, (), ()]
. tuple:
. . (202,)
. . {32: 302, 33: (501, (999,), 504)}
. . OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})])

I will propose a possible answer, as I have not found an existing solution to this. 我将提出一个可能的答案,因为我尚未找到解决方案。 I appreciate any alternatives, or pointers to pre-existing solutions. 我感谢任何替代方法,或对现有解决方案的指导。

EDIT I forgot to mention that this has to work in Python 2.7. 编辑我忘了提到它必须在Python 2.7中工作。 I can't use Python 3 at this time. 我目前无法使用Python 3。

Though it IS worth posting Python 3 solutions, for others. 虽然它值得张贴的Python 3个解决方案,为别人。 So please indicate which python you are answering for. 因此,请指明您要回答的python。

If you can assume that the __init__ methods of the various subclasses have the same signature as the typical base class: 如果可以假设各种子类的__init__方法具有与典型基类相同的签名:

def remove_none(obj):
  if isinstance(obj, (list, tuple, set)):
    return type(obj)(remove_none(x) for x in obj if x is not None)
  elif isinstance(obj, dict):
    return type(obj)((remove_none(k), remove_none(v))
      for k, v in obj.items() if k is not None and v is not None)
  else:
    return obj

from collections import OrderedDict
data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
print remove_none(data)

Note that this won't work with a defaultdict for example since the defaultdict takes and additional argument to __init__ . 请注意,例如,这不适用于defaultdict ,因为defaultdict采用__init__附加参数。 To make it work with defaultdict would require another special case elif (before the one for regular dicts). 要使其与defaultdict将需要另一种特殊情况的elif (在常规dict之前)。


Also note that I've actually constructed new objects. 还要注意,我实际上已经构造了对象。 I haven't modified the old ones. 我没有修改旧的。 It would be possible to modify the old objects if you didn't need to support modifying immutable objects like tuple . 如果不需要支持修改不可变对象(例如tuple则可以修改旧对象。

If you want a full-featured, yet succinct approach to handling real-world nested data structures like these, and even handle cycles, I recommend looking at the remap utility from the boltons utility package . 如果您希望使用功能全面但简洁的方法来处理此类现实世界中的嵌套数据结构,甚至处理周期,建议您从boltons实用程序包中查看remap实用程序

After pip install boltons or copying iterutils.py into your project, just do: pip install boltons或将iterutils.py复制到您的项目后,只需执行以下操作:

from collections import OrderedDict
from boltons.iterutils import remap

data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]

drop_none = lambda path, key, value: key is not None and value is not None

cleaned = remap(data, visit=drop_none)

print(cleaned)

# got:
[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]

This page has many more examples , including ones working with much larger objects (from Github's API). 该页面还有更多示例 ,包括使用更大对象的示例 (来自Github的API)。

It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. 它是纯Python,因此可在任何地方使用,并已在Python 2.7和3.3+中进行了全面测试。 Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here . 最棒的是,我是针对这样的情况编写的,因此,如果您发现它无法处理的情况,则可以在这里麻烦我进行修复。

def stripNone(data):
    if isinstance(data, dict):
        return {k:stripNone(v) for k, v in data.items() if k is not None and v is not None}
    elif isinstance(data, list):
        return [stripNone(item) for item in data if item is not None]
    elif isinstance(data, tuple):
        return tuple(stripNone(item) for item in data if item is not None)
    elif isinstance(data, set):
        return {stripNone(item) for item in data if item is not None}
    else:
        return data

Sample Runs: 样品运行:

print stripNone(data1)
print stripNone(data2)
print stripNone(data3)
print stripNone(data)

(501, (999,), 504)
{'four': 'sixty', 1: 601}
{12: 402, 14: {'four': 'sixty', 1: 601}}
[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, {12: 402, 14: {'four': 'sixty', 1: 601}})]
def purify(o):
    if hasattr(o, 'items'):
        oo = type(o)()
        for k in o:
            if k != None and o[k] != None:
                oo[k] = purify(o[k])
    elif hasattr(o, '__iter__'):
        oo = [ ] 
        for it in o:
            if it != None:
                oo.append(purify(it))
    else: return o
    return type(o)(oo)

print purify(data)

Gives: 给出:

[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]

This is my original attempt, before posting the question. 这是我最初的尝试,在发布问题之前。 Keeping it here, as it may help explain the goal. 将其保留在此处,因为它可能有助于解释目标。

It also has some code that would be useful if one wants to MODIFY an existing LARGE collection, rather than duplicating the data into a NEW collection. 它还有一些代码,如果想要修改现有的LARGE集合,而不是将数据复制到NEW集合中,将很有用。 (The other answers create new collections.) (其他答案将创建新的收藏集。)

# ---------- StripNones.py Python 2.7 ----------

import collections, copy

# Recursively remove None, from list/tuple elements, and dict key/values.
# NOTE: Changes type of iterable to list, except for strings and tuples.
# NOTE: We don't RECURSE KEYS.
# When "beImmutable=False", may modify "data".
# Result may have different collection types; similar to "filter()".
def StripNones(data, beImmutable=True):
    t = type(data)
    if issubclass(t, dict):
        return _StripNones_FromDict(data, beImmutable)

    elif issubclass(t, collections.Iterable):
        if issubclass(t, basestring):
            # Don't need to search a string for None.
            return data

        # NOTE: Changes type of iterable to list.
        data = [StripNones(x, beImmutable) for x in data if x is not None]
        if issubclass(t, tuple):
            return tuple(data)

    return data

# Modifies dict, removing items whose keys are in keysToRemove.
def RemoveKeys(dict, keysToRemove):
    for key in keysToRemove:
        dict.pop(key, None) 

# Recursively remove None, from dict key/values.
# NOTE: We DON'T RECURSE KEYS.
# When "beImmutable=False", may modify "data".
def _StripNones_FromDict(data, beImmutable):
    keysToRemove = []
    newItems = []
    for item in data.iteritems():
        key = item[0]
        if None in item:
            # Either key or value is None.
            keysToRemove.append( key )
        else:
            # The value might change when stripped.
            oldValue = item[1]
            newValue = StripNones(oldValue, beImmutable)
            if newValue is not oldValue:
                newItems.append( (key, newValue) )

    somethingChanged = (len(keysToRemove) > 0) or (len(newItems) > 0)
    if beImmutable and somethingChanged:
        # Avoid modifying the original.
        data = copy.copy(data)

    if len(keysToRemove) > 0:
        # if not beImmutable, MODIFYING ORIGINAL "data".
        RemoveKeys(data, keysToRemove)

    if len(newItems) > 0:
        # if not beImmutable, MODIFYING ORIGINAL "data".
        data.update( newItems )

    return data



# ---------- TESTING ----------
# When run this file as a script (instead of importing it):
if (__name__ == "__main__"):
    from collections import OrderedDict

    maxWidth = 100
    indentStr = '. '

    def NewLineAndIndent(indent):
        return '\n' + indentStr*indent
    #print NewLineAndIndent(3)

    # Returns list of strings.
    def HeaderAndItems(value, indent=0):
        if isinstance(value, basestring):
            L = repr(value)
        else:
            if isinstance(value, dict):
                L = [ repr(key) + ': ' + Repr(value[key], indent+1) for key in value ]
            else:
                L = [ Repr(x, indent+1) for x in value ]
            header = type(value).__name__ + ':'
            L.insert(0, header)
        #print L
        return L

    def Repr(value, indent=0):
        result = repr(value)
        if (len(result) > maxWidth) and \
          isinstance(value, collections.Iterable) and \
          not isinstance(value, basestring):
            L = HeaderAndItems(value, indent)
            return NewLineAndIndent(indent + 1).join(L)

        return result

    #print Repr( [11, [221, 222], {'331':331, '332': {'3331':3331} }, 44] )

    def printV(name, value):
        print( str(name) + "= " + Repr(value) )

    print '\n\n\n'
    data1 = ( 501, (None, 999), None, (None), 504 )
    data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
    data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
    data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
    printV( 'ORIGINAL data', data )
    printV( 'StripNones(data)', StripNones(data) )
    print '----- beImmutable = True -----'
    #printV( 'data', data )
    printV( 'data2', data2 )
    #printV( 'data3', data3 )
    print '----- beImmutable = False -----'
    StripNones(data, False)
    #printV( 'data', data )
    printV( 'data2', data2 )
    #printV( 'data3', data3 )
    print

Output: 输出:

ORIGINAL data= list:
. [None, 22, (None,), (None, None), None]
. tuple:
. . (None, 202)
. . {32: 302, 33: (501, (None, 999), None, None, 504), None: 301}
. . OrderedDict:
. . . None: 401
. . . 12: 402
. . . 13: None
. . . 14: {'four': 'sixty', 1: 601, 2: None, None: 603}
StripNones(data)= list:
. [22, (), ()]
. tuple:
. . (202,)
. . {32: 302, 33: (501, (999,), 504)}
. . OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})])
----- beImmutable = True -----
data2= {'four': 'sixty', 1: 601, 2: None, None: 603}
----- beImmutable = False -----
data2= {'four': 'sixty', 1: 601}

Key points: 关键点:

  • if issubclass(t, basestring): avoids searching inside of strings, as that makes no sense, AFAIK. if issubclass(t, basestring):避免在字符串内部搜索,因为AFAIK没有意义。

  • if issubclass(t, tuple): converts the result back to a tuple. if issubclass(t, tuple):将结果转换回元组。

  • For dictionaries, copy.copy(data) is used, to return an object of the same type as the original dictionary. 对于字典,使用copy.copy(data)返回与原始字典相同类型的对象。

  • LIMITATION: Does not attempt to preserve collection/iterator type for types other than: list, tuple, dict (& its subclasses). 限制:除列表,元组,字典(及其子类)以外,不尝试保留其他类型的集合/迭代器类型。

  • Default usage copies data structures, if a change is needed. 如果需要更改,默认用法将复制数据结构。 Passing in False for beImmutable can result in higher performance when a LOT of data, but will alter the original data, including altering nested pieces of the data -- which might be referenced by variables elsewhere in your code. beImmutable传递False会导致很多数据时产生更高的性能,但会更改原始数据,包括更改数据的嵌套部分-这些可能会被代码中其他位置的变量引用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM