Itertools groupby 按两个值组织字典列表

Question

我正在尝试按出生时的 state 以及他们是否有 0 钱来组织价值观。 Itertools groupby function 看起来是最简单的方法，但我正在努力实现它。 也对其他选项开放。

如果我有一个看起来像这样的字典列表

users = [
            {"name": "John", "state_of_birth": "CA", "money": 0},
            {"name": "Andrew", "state_of_birth": "CA", "money": 300},
            {"name": "Scott", "state_of_birth": "OR", "money": 20},
            {"name": "Travis", "state_of_birth": "NY", "money": 0},
            {"name": "Bill", "state_of_birth": "CA", "money": 0},
            {"name": "Mike", "state_of_birth": "NY", "money": 0}
        ]

我正在尝试获取此 output

desired_output = [
            [{"name": "John", "state_of_birth": "CA", "money": 0}, {"name": "Bill", "state_of_birth": "CA", "money": 0}],
            [{"name": "Andrew", "state_of_birth": "CA", "money": 300}],
            [{"name": "Scott", "state_of_birth": "OR", "money": 20}],
            [{"name": "Travis", "state_of_birth": "NY", "money": 0},{"name": "Mike", "state_of_birth": "NY", "money": 0}]
            ]

Answer 1

您可以像这样使用itertools ：

import itertools

def func(x):
    return tuple([x['state_of_birth'], x['money'] != 0])

desired_output = list(list(v) for _,v in itertools.groupby(sorted(users, key=func), func))

group_by function 是生成key和value的生成器。 密钥是从我们传递给itertools.groupb_by()的key_function派生的。 在您的情况下， keys不重要，这就是为什么在for _, v中忽略它的原因。

Output：

[{'name': 'John', 'state_of_birth': 'CA', 'money': 0}, {'name': 'Bill', 'state_of_birth': 'CA', 'money': 0}]
[{'name': 'Andrew', 'state_of_birth': 'CA', 'money': 300}]
[{'name': 'Travis', 'state_of_birth': 'NY', 'money': 0}, {'name': 'Mike', 'state_of_birth': 'NY', 'money': 0}]
[{'name': 'Scott', 'state_of_birth': 'OR', 'money': 20}]

Answer 2

代码：

users = [
            {"name": "John", "state_of_birth": "CA", "money": 0},
            {"name": "Andrew", "state_of_birth": "CA", "money": 300},
            {"name": "Scott", "state_of_birth": "OR", "money": 20},
            {"name": "Travis", "state_of_birth": "NY", "money": 0},
            {"name": "Bill", "state_of_birth": "CA", "money": 0},
            {"name": "Mike", "state_of_birth": "NY", "money": 0}
        ]

result = {}
for user in users:
    key = (user["state_of_birth"],user["money"])
    if key in result:
        result[key].extend([user])
    else:
        result[key] = [user]
for _,v in result.items():
    print(v)

结果：

[{'name': 'John', 'state_of_birth': 'CA', 'money': 0}, {'name': 'Bill', 'state_of_birth': 'CA', 'money': 0}]
[{'name': 'Andrew', 'state_of_birth': 'CA', 'money': 300}]
[{'name': 'Scott', 'state_of_birth': 'OR', 'money': 20}]
[{'name': 'Travis', 'state_of_birth': 'NY', 'money': 0}, {'name': 'Mike', 'state_of_birth': 'NY', 'money': 0}]

Answer 3

如果我理解这个问题是正确的，你有一个结构是List[Dict]并且你想要一个List[List[Dict]] ，其中内部列表包含具有相同state_of_birth和money > 0 boolean 的字典。

我想说最简单的解决方案实际上是使用pandas

import pandas as pd

users = [
            {"name": "John", "state_of_birth": "CA", "money": 0},
            {"name": "Andrew", "state_of_birth": "CA", "money": 300},
            {"name": "Scott", "state_of_birth": "OR", "money": 20},
            {"name": "Travis", "state_of_birth": "NY", "money": 0},
            {"name": "Bill", "state_of_birth": "CA", "money": 0},
            {"name": "Mike", "state_of_birth": "NY", "money": 0}
        ]

df = pd.DataFrame.from_records(users)

# we need a column to indicate if money > 0
df["money_bool"] = df["money"] > 0

# groupby gives you an iterator of Tuple[key, sub-dataframe]
# dfs now holds a list of your grouped dataframes
dfs = [tup[1] for tup in df.groupby(["state_of_birth", "money_bool"])]

# you can now drop the money_bool column if you want
dfs = [df.drop("money_bool", axis=1) for df in dfs]

desired_output = [df.to_dict("records") for df in dfs]

根据问题的上下文，您最好保留数据框/表格格式

Answer 4

您需要确保对groupby function 的输入进行排序。 您可以使用与分组相同的密钥 function ：

users = [
            {"name": "John", "state_of_birth": "CA", "money": 0},
            {"name": "Andrew", "state_of_birth": "CA", "money": 300},
            {"name": "Scott", "state_of_birth": "OR", "money": 20},
            {"name": "Travis", "state_of_birth": "NY", "money": 0},
            {"name": "Bill", "state_of_birth": "CA", "money": 0},
            {"name": "Mike", "state_of_birth": "NY", "money": 0}
        ]

def selector(item): return (item.get('state_of_birth'), item.get('money') != 0)
sorted_users = sorted(users, key=selector)
result = [list(group) for _, group in groupby(sorted_users, selector) ]

Output：

[
    [{'name': 'John', 'state_of_birth': 'CA', 'money': 0}, {'name': 'Bill', 'state_of_birth': 'CA', 'money': 0}],
    [{'name': 'Andrew', 'state_of_birth': 'CA', 'money': 300}], 
    [{'name': 'Travis', 'state_of_birth': 'NY', 'money': 0}, {'name': 'Mike', 'state_of_birth': 'NY', 'money': 0}],
    [{'name': 'Scott', 'state_of_birth': 'OR', 'money': 20}]
]

Answer 5

虽然它的名字看起来应该是 go 的方式，但itertools.groupby不是正确的 function 使用，因为它需要对数据进行预排序。 对于应该为 O(n) 的算法，排序会将您的时间复杂度提高到 O(n log(n))。

换个角度来看，如果你有一百万条记录要排序，而不是一百万次迭代，如果你使用groupby而不是循环和字典，你现在有 2000 万次迭代。 这是一个相当大的性能损失。

如果groupby写起来更干净或者没有导入，它可能是合理的，但它比使用普通循环和字典的更简单方法可读性差。

Pandas 很好，但除非你已经这样做了，否则真的没有理由使用它。 这就像带上航天飞机烤西葫芦一样。

您可以使用defaultdict和循环：

from collections import defaultdict
from pprint import pprint

users = [
    {"name": "John", "state_of_birth": "CA", "money": 0},
    {"name": "Andrew", "state_of_birth": "CA", "money": 300},
    {"name": "Scott", "state_of_birth": "OR", "money": 20},
    {"name": "Travis", "state_of_birth": "NY", "money": 0},
    {"name": "Bill", "state_of_birth": "CA", "money": 0},
    {"name": "Mike", "state_of_birth": "NY", "money": 0},
]

grouped = defaultdict(list)
groupby = "state_of_birth", "money"

for user in users:
    grouped[tuple([user[k] for k in groupby])].append(user)

pprint([*grouped.values()])

如果您想要“钱不是零”而不仅仅是"money"值本身，您可以使用自定义分组 function：

grouped = defaultdict(list)

def group_by(x):
    return x["state_of_birth"], x["money"] != 0

for user in users:
    grouped[group_by(user)].append(user)

result = [*grouped.values()]

或内联逻辑：

grouped = defaultdict(list)

for user in users:
    grouped[user["state_of_birth"], user["money"] != 0].append(user)

result = [*grouped.values()]

Itertools groupby 按两个值组织字典列表

问题描述

5 个解决方案

解决方案1
1 2021-05-26 02:25:57

解决方案2
0 2021-05-26 02:33:15

解决方案3
0 2021-05-26 02:41:01

解决方案4
0 2021-05-26 02:59:08

解决方案5
0 2021-05-26 03:02:45

Itertools groupby 按两个值组织字典列表

问题描述

5 个解决方案

解决方案1 1 2021-05-26 02:25:57

解决方案2 0 2021-05-26 02:33:15

解决方案3 0 2021-05-26 02:41:01

解决方案4 0 2021-05-26 02:59:08

解决方案5 0 2021-05-26 03:02:45

解决方案1
1 2021-05-26 02:25:57

解决方案2
0 2021-05-26 02:33:15

解决方案3
0 2021-05-26 02:41:01

解决方案4
0 2021-05-26 02:59:08

解决方案5
0 2021-05-26 03:02:45