简体   繁体   English

列表理解中的多个条件

[英]Multiple conditions in list comprehension

I have a list of nested dictionaries that looks as follows:我有一个嵌套字典列表,如下所示:

messages_all = [{'type': 'message',
      'subtype': 'bot_message',
      'text': "This content can't be displayed.",
      'ts': '1573358255.000100',
      'username': 'Userform',
      'icons': {'image_30': 'www.example.com'},
      'bot_id': 'JOD4K22SJW',
      'blocks': [{'type': 'section',
        'block_id': 'yCKUB',
        'text': {'type': 'mrkdwn',
         'text': 'Your *survey* has a new response.',
         'verbatim': False}},
       {'type': 'section',
        'block_id': '37Mt4',
        'text': {'type': 'mrkdwn',
         'text': '*Thanks for your response. Where did you first hear about us?*\nFriend',
         'verbatim': False}},
       {'type': 'section',
        'block_id': 'hqps2',
        'text': {'type': 'mrkdwn',
         'text': '*How would you rate your experience?*\n9',
         'verbatim': False}},
       {'type': 'section',
        'block_id': 'rvi',
        'text': {'type': 'mrkdwn', 'text': '*city*\nNew York', 'verbatim': False}},
       {'type': 'section',
        'block_id': 'q=L+',
        'text': {'type': 'mrkdwn',
         'text': '*order_id*\n123456',
         'verbatim': False}}]},

{'type': 'message',
  'subtype': 'channel_join',
  'ts': '1650897290.290259',
  'user': 'T01CTZE4MB6',
  'text': '<@U03CTDZ4MA6> has joined the channel',
  'inviter': 'A033AHJCK'},

{'type': 'message',
  'subtype': 'channel_leave',
  'ts': '1650899175.290259',
  'user': 'T01CTZE4MB6',
  'text': '<@U03CTDZ4MA6> has left the channel',
  'inviter': 'A033AHJCK'},

{'client_msg_id': '123456jk-a19c-97fe-35c9-3c9f643cae19',
  'type': 'message',
  'text': '<@ABC973RJD>',
  'user': 'UM1922AJG',
  'ts': '1573323860.000300',
  'team': 'B09AJR39A',
  'reactions': [{'name': '+1', 'users': ['UM1927AJG'], 'count': 1}]},

{'client_msg_id': '1234CAC1-FEC8-4F25-8CE5-C135B7FJB2E',
  'type': 'message',
  'text': '<@UM1922AJG> ',
  'user': 'UM1922AJG',
  'ts': '1573791416.000200',
  'team': 'AJCR23H',
  'thread_ts': '1573791416.000200',
  'reply_count': 3,
  'reply_users_count': 2,
  'latest_reply': '1573829538.002000',
  'reply_users': ['UM3HRC74J', 'UM1922AJG'],
  'is_locked': False,
  'subscribed': False}

]

I'd like to be able to filter out dictionaries with the following我希望能够使用以下内容过滤掉字典

client_msg_id
channel_join
channel_leave
reply_users_count

My code to do so is:我这样做的代码是:

filtered_messages = [elem for elem in messages_all if not elem.get('client_msg_id')
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave')
                     or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2)
                ]

From testing, it seems as though only the client_msg_id is being filtered out.从测试来看,似乎只有client_msg_id被过滤掉了。 The others are not.其他人不是。

Would someone please assist me with the syntax of this list comprehension?有人可以帮助我理解这个列表理解的语法吗?

IIUC, you're simply missing parentheses to negate the union of all the conditions: IIUC,您只是缺少括号来否定所有条件的并集:

filtered_messages = [elem for elem in messages_all if not (elem.get('client_msg_id')
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave')
                     or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2))
                ]

This would keep only the first element of your input in the example.这将只保留示例中输入的第一个元素。

output: output:

[{'type': 'message', 'subtype': 'bot_message', 'text': "This content can't be displayed.", 'ts': '1573358255.000100', 'username': 'Userform', 'icons': {'image_30': 'www.example.com'}, 'bot_id': 'JOD4K22SJW', 'blocks': [{'type': 'section', 'block_id': 'yCKUB', 'text': {'type': 'mrkdwn', 'text': 'Your *survey* has a new response.', 'verbatim': False}}, {'type': 'section', 'block_id': '37Mt4', 'text': {'type': 'mrkdwn', 'text': '*Thanks for your response. Where did you first hear about us?*\nFriend', 'verbatim': False}}, {'type': 'section', 'block_id': 'hqps2', 'text': {'type': 'mrkdwn', 'text': '*How would you rate your experience?*\n9', 'verbatim': False}}, {'type': 'section', 'block_id': 'rvi', 'text': {'type': 'mrkdwn', 'text': '*city*\nNew York', 'verbatim': False}}, {'type': 'section', 'block_id': 'q=L+', 'text': {'type': 'mrkdwn', 'text': '*order_id*\n123456', 'verbatim': False}}]}
]

Like @mozway said, it is simply some parentheses missing.就像@mozway 所说的,只是少了一些括号。

For such a large if condition, I would personnally go further and create a function:对于这么大的 if 条件,我个人会进一步创建 go 并创建一个 function:

def my_filter(elem):
    if not (elem.get('client_msg_id') 
      or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
      or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave') 
      or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2)):
      return True
    return False

filtered_messages = [elem for elem in messages_all if my_filter(elem)]

Edit: delete extra boolean variable编辑:删除额外的 boolean 变量

Given the length of the resulting listcomp I would write something like this instead:鉴于生成的 listcomp 的长度,我会写这样的东西:

def filterdict(d):
    subtypes = {"channel_join", "channel_leave"}
    return any(
        test(d)
        for test in (
            lambda d: d["type"] == "message" and d.get("subtype") in subtypes,
            lambda d: d["type"] == "message" and d.get("reply_user_count") == 2,
            lambda d: d.get("client_msg_id"),
        )
    )


msgs = [x for x in messages_all if not filterdict(x)]

In this form:在这种形式中:

  • we have a filter fn which returns False for an interesting msg, so we can use it natively with itertools.filterfalse我们有一个过滤器 fn,它会为有趣的消息返回False ,因此我们可以在本地使用它与itertools.filterfalse
  • the conditions are clearly set out条件明确
  • the use of lambdas and all ensures encapsulation of tests---a mislaid parenthesis is not going to cause the kind of problem which motivated the question lambdas 的使用和all确保了测试的封装——错误放置的括号不会导致引发问题的那种问题
  • we've wrapped two functionally idential tests in a test for membership, which is clearer and easier to read.我们在成员资格测试中包装了两个功能相同的测试,这样更清晰易读。

Whether one likes this kind of thing is going to be a matter of taste in the end.一个人是否喜欢这种东西最终将是一个品味问题。

I found out the get method is much slower than checking if a key is in the dictionary, so if you have big data it would be faster to go with check for existing key in dictionary:我发现 get 方法比检查一个键是否在字典中要慢得多,所以如果你有大数据,它会更快到 go 检查字典中的现有键:

filtered_messages = [elem for elem in messages_all
                     if "client_msg_id" not in elem
                     and not ("type" in elem
                              and not ('subtype' in elem
                                       and not (elem['subtype'] in ['channel_join', 'channel_leave']
                                                or ('reply_users_count' in elem
                                                    and elem['reply_users_count'] == 2))))]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM