列表理解中的多个条件

Question

I have a list of nested dictionaries that looks as follows:我有一个嵌套字典列表，如下所示：

messages_all = [{'type': 'message',
      'subtype': 'bot_message',
      'text': "This content can't be displayed.",
      'ts': '1573358255.000100',
      'username': 'Userform',
      'icons': {'image_30': 'www.example.com'},
      'bot_id': 'JOD4K22SJW',
      'blocks': [{'type': 'section',
        'block_id': 'yCKUB',
        'text': {'type': 'mrkdwn',
         'text': 'Your *survey* has a new response.',
         'verbatim': False}},
       {'type': 'section',
        'block_id': '37Mt4',
        'text': {'type': 'mrkdwn',
         'text': '*Thanks for your response. Where did you first hear about us?*\nFriend',
         'verbatim': False}},
       {'type': 'section',
        'block_id': 'hqps2',
        'text': {'type': 'mrkdwn',
         'text': '*How would you rate your experience?*\n9',
         'verbatim': False}},
       {'type': 'section',
        'block_id': 'rvi',
        'text': {'type': 'mrkdwn', 'text': '*city*\nNew York', 'verbatim': False}},
       {'type': 'section',
        'block_id': 'q=L+',
        'text': {'type': 'mrkdwn',
         'text': '*order_id*\n123456',
         'verbatim': False}}]},

{'type': 'message',
  'subtype': 'channel_join',
  'ts': '1650897290.290259',
  'user': 'T01CTZE4MB6',
  'text': '<@U03CTDZ4MA6> has joined the channel',
  'inviter': 'A033AHJCK'},

{'type': 'message',
  'subtype': 'channel_leave',
  'ts': '1650899175.290259',
  'user': 'T01CTZE4MB6',
  'text': '<@U03CTDZ4MA6> has left the channel',
  'inviter': 'A033AHJCK'},

{'client_msg_id': '123456jk-a19c-97fe-35c9-3c9f643cae19',
  'type': 'message',
  'text': '<@ABC973RJD>',
  'user': 'UM1922AJG',
  'ts': '1573323860.000300',
  'team': 'B09AJR39A',
  'reactions': [{'name': '+1', 'users': ['UM1927AJG'], 'count': 1}]},

{'client_msg_id': '1234CAC1-FEC8-4F25-8CE5-C135B7FJB2E',
  'type': 'message',
  'text': '<@UM1922AJG> ',
  'user': 'UM1922AJG',
  'ts': '1573791416.000200',
  'team': 'AJCR23H',
  'thread_ts': '1573791416.000200',
  'reply_count': 3,
  'reply_users_count': 2,
  'latest_reply': '1573829538.002000',
  'reply_users': ['UM3HRC74J', 'UM1922AJG'],
  'is_locked': False,
  'subscribed': False}

]

I'd like to be able to filter out dictionaries with the following我希望能够使用以下内容过滤掉字典

client_msg_id
channel_join
channel_leave
reply_users_count

My code to do so is:我这样做的代码是：

filtered_messages = [elem for elem in messages_all if not elem.get('client_msg_id')
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave')
                     or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2)
                ]

From testing, it seems as though only the client_msg_id is being filtered out.从测试来看，似乎只有client_msg_id被过滤掉了。 The others are not.其他人不是。

Would someone please assist me with the syntax of this list comprehension?有人可以帮助我理解这个列表理解的语法吗？

Answer 1

IIUC, you're simply missing parentheses to negate the union of all the conditions: IIUC，您只是缺少括号来否定所有条件的并集：

filtered_messages = [elem for elem in messages_all if not (elem.get('client_msg_id')
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
                     or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave')
                     or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2))
                ]

This would keep only the first element of your input in the example.这将只保留示例中输入的第一个元素。

output: output：

[{'type': 'message', 'subtype': 'bot_message', 'text': "This content can't be displayed.", 'ts': '1573358255.000100', 'username': 'Userform', 'icons': {'image_30': 'www.example.com'}, 'bot_id': 'JOD4K22SJW', 'blocks': [{'type': 'section', 'block_id': 'yCKUB', 'text': {'type': 'mrkdwn', 'text': 'Your *survey* has a new response.', 'verbatim': False}}, {'type': 'section', 'block_id': '37Mt4', 'text': {'type': 'mrkdwn', 'text': '*Thanks for your response. Where did you first hear about us?*\nFriend', 'verbatim': False}}, {'type': 'section', 'block_id': 'hqps2', 'text': {'type': 'mrkdwn', 'text': '*How would you rate your experience?*\n9', 'verbatim': False}}, {'type': 'section', 'block_id': 'rvi', 'text': {'type': 'mrkdwn', 'text': '*city*\nNew York', 'verbatim': False}}, {'type': 'section', 'block_id': 'q=L+', 'text': {'type': 'mrkdwn', 'text': '*order_id*\n123456', 'verbatim': False}}]}
]

Answer 2

Like @mozway said, it is simply some parentheses missing.就像@mozway 所说的，只是少了一些括号。

For such a large if condition, I would personnally go further and create a function:对于这么大的 if 条件，我个人会进一步创建 go 并创建一个 function：

def my_filter(elem):
    if not (elem.get('client_msg_id') 
      or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_join') 
      or (elem.get('type') == 'message' and elem.get('subtype') == 'channel_leave') 
      or (elem.get('type') == 'message' and elem.get('reply_users_count') == 2)):
      return True
    return False

filtered_messages = [elem for elem in messages_all if my_filter(elem)]

Edit: delete extra boolean variable编辑：删除额外的 boolean 变量

Answer 3

Given the length of the resulting listcomp I would write something like this instead:鉴于生成的 listcomp 的长度，我会写这样的东西：

def filterdict(d):
    subtypes = {"channel_join", "channel_leave"}
    return any(
        test(d)
        for test in (
            lambda d: d["type"] == "message" and d.get("subtype") in subtypes,
            lambda d: d["type"] == "message" and d.get("reply_user_count") == 2,
            lambda d: d.get("client_msg_id"),
        )
    )


msgs = [x for x in messages_all if not filterdict(x)]

In this form:在这种形式中：

we have a filter fn which returns False for an interesting msg, so we can use it natively with itertools.filterfalse我们有一个过滤器 fn，它会为有趣的消息返回False ，因此我们可以在本地使用它与itertools.filterfalse
the conditions are clearly set out条件明确
the use of lambdas and all ensures encapsulation of tests---a mislaid parenthesis is not going to cause the kind of problem which motivated the question lambdas 的使用和all确保了测试的封装——错误放置的括号不会导致引发问题的那种问题
we've wrapped two functionally idential tests in a test for membership, which is clearer and easier to read.我们在成员资格测试中包装了两个功能相同的测试，这样更清晰易读。

Whether one likes this kind of thing is going to be a matter of taste in the end.一个人是否喜欢这种东西最终将是一个品味问题。

Answer 4

I found out the get method is much slower than checking if a key is in the dictionary, so if you have big data it would be faster to go with check for existing key in dictionary:我发现 get 方法比检查一个键是否在字典中要慢得多，所以如果你有大数据，它会更快到 go 检查字典中的现有键：

filtered_messages = [elem for elem in messages_all
                     if "client_msg_id" not in elem
                     and not ("type" in elem
                              and not ('subtype' in elem
                                       and not (elem['subtype'] in ['channel_join', 'channel_leave']
                                                or ('reply_users_count' in elem
                                                    and elem['reply_users_count'] == 2))))]

列表理解中的多个条件

问题描述

4 个解决方案

解决方案1
1 2022-05-04 12:14:09

解决方案2
0 2022-05-04 12:34:04

解决方案3
0 2022-05-04 13:08:55

解决方案4
0 2022-05-04 13:37:21

列表理解中的多个条件

问题描述

4 个解决方案

解决方案1 1 2022-05-04 12:14:09

解决方案2 0 2022-05-04 12:34:04

解决方案3 0 2022-05-04 13:08:55

解决方案4 0 2022-05-04 13:37:21

解决方案1
1 2022-05-04 12:14:09

解决方案2
0 2022-05-04 12:34:04

解决方案3
0 2022-05-04 13:08:55

解决方案4
0 2022-05-04 13:37:21