简体   繁体   English

如何在 python 中平均拆分列表块?

[英]How to split list in chunks in equally in python?

i am trying to add users to my group using pyrogram i have 200 user ids in a list - python我正在尝试使用 pyrogram 将用户添加到我的组我在列表中有 200 个用户 ID - python

list_of_users = [user_id1, user_id2, user_id3, user_id4, ...]

i also, have a list of 7 clients, what i waana do is distribute, no of list of user ids among 7 clients (approx. equally) and add them, also i sometimes have uneven number of users so how do i distribute the list add users accordingly using python?我也有 7 个客户的列表,我 waana 做的是分发,没有 7 个客户端之间的用户 ID 列表(大约相等)并添加它们,而且有时我的用户数量不均匀,所以我如何分发列表使用 python 相应地添加用户?

btw: its okay if 2-3 users are not properly distributed, like i wanna distribute approx.顺便说一句:如果 2-3 个用户没有正确分配也没关系,就像我想分配大约。 and add them but none of the users should miss.并添加它们,但任何用户都不应错过。

i tried this function -我试过这个 function -

def divide_chunks(l, n):
    for i in range(0, len(l), n): 
        yield l[i:i + n]

but it doesn't distribute evenly it distributes specific number of chuncks and at last gives remaining chunks which is not what i want.但它分布不均匀,它分配了特定数量的块,最后给出了剩余的块,这不是我想要的。

inshort: i want the output to be autodecided and decide how to evenly distribute the user ids.简而言之:我希望自动确定 output 并决定如何平均分配用户 ID。

most of answer in stackover flow we have to decide no of chunks i don't wanna - all i want to do is distribute the x no of items into y no of equal parts stackover flow 中的大部分答案我们必须决定没有我不想的块 - 我想要做的就是将 x 的项目分配到 y 的相等部分

You can use:您可以使用:

np.array_split(list_of_users, NUMBER_OF_CLIENTS)

More in: Docs更多内容: 文档

DIY: Without external libraries DIY:没有外部库

Here is one approach without external libraries.这是一种没有外部库的方法。 This implementation will assign an equal number of users to each client if possible.如果可能,此实现将为每个客户端分配相同数量的用户。 If not it will make sure the difference in number of users assigned to clients between clients is at max 1 (= my definition of fair ).如果不是,它将确保客户端之间分配给客户端的用户数量差异最大为 1(= 我对fair的定义)。 Additionally, it will make sure that additional users are not assigned to the same clients all the time, if you were to run this multiple times.此外,如果您要多次运行此程序,它将确保不会始终将其他用户分配给相同的客户端。 It does this by randomly choosing the set of clients which will need to take on one of the remaining users (that could not be assigned to clients in equal parts).它通过随机选择一组客户端来实现这一点,这些客户端需要接管剩余用户之一(无法将其平均分配给客户端)。 This ensures a fair allocation of users to clients.这确保了将用户公平分配给客户端。

It's a bit more code that I post, so here some high-level explanation:这是我发布的更多代码,所以这里有一些高级解释:

The relevant function is called assign_users_to_clients() .相关的 function 称为assign_users_to_clients() This will do the job you intend to do.这将完成您打算做的工作。 The two other functions verify_all_users_assigned() and print_mapping() are just utility functions for the sake of this demo.其他两个函数verify_all_users_assigned()print_mapping()只是本演示的实用函数。 One will make sure the assignment is correct, ie users are assigned to exactly one client (no duplicate assignments, no unassigned users) and the other just prints the result a bit nicer so you can verify that the distribution of users to clients is actually fair.一个将确保分配正确,即用户被分配给一个客户端(没有重复分配,没有未分配的用户),另一个只是打印更好的结果,以便您可以验证用户到客户端的分配实际上是公平的.

import random


def verify_all_users_assigned(users, client_user_dict):
    """
    Verify that all users have indeed been assigned to a client.
    Not necessary for the algorithm but used to check whether the implementation is correct.
    :param users: list of all users that have to be assigned
    :param client_user_dict: assignment of users to clients
    :return:
    """
    users_assigned_to_clients = set()
    duplicate_users = list()

    for clients_for_users in client_user_dict.values():
        client_set = set(clients_for_users)
        # if there is an intersection those users have been assigned twice (at least)
        inter = users_assigned_to_clients.intersection(client_set)
        if len(inter) != 0:
            duplicate_users.extend(list(inter))
        # now make union of clients to know which clients have already been processed
        users_assigned_to_clients = users_assigned_to_clients.union(client_set)
    all_users = set(users)
    remaining_users = users_assigned_to_clients.difference(all_users)
    if len(remaining_users) != 0:
        print(f"Not all users have been assigned to clients. Missing are {remaining_users}")
        return
    if len(duplicate_users) != 0:
        print(f"Some users have been assigned at least twice. Those are {duplicate_users}")
        return
    print(f"All users have successfully been assigned to clients.")


def assign_users_to_clients(users, clients):
    """
    Assign users to clients.
    :param users: list of users
    :param clients: list of clients
    :return: dictionary with mapping from clients to users
    """
    users_per_client = len(users) // len(clients)
    remaining_clients = len(users) % len(clients)
    if remaining_clients != 0:
        print(
            f"An equal split is not possible! {remaining_clients} users would remain when each client takes on {users_per_client} users. Assigning remaining users to random clients.")

    # assign each client his fair share of users
    client_users = list()
    for i in range(0, len(users), users_per_client):
        # list of all clients for one user
        user_for_client = list()
        last_client = i + users_per_client
        # make sure we don't run out of bounds here
        if last_client > len(users):
            last_client = len(users)
        # run from current position (as determined by range()) to last client (as determined by the step value)
        # this will assign all users (that belong to the client's share of users) to one client
        for j in range(i, last_client):
            # assign user to client
            user_for_client.append(users[j])
        client_users.append(user_for_client)

    # Assign clients and users as determined above
    client_user_registry = {clients[i]: client_users[i] for i in range(len(clients))}
    # now we need to take care of the remaining clients
    # we could just go from back to front and assign one more user to each client but to make it fair, choose randomly without repetition
    start = users_per_client * len(clients)
    for i, client in enumerate(random.sample(clients, k=remaining_clients)):
        client_user_registry[client].append(users[start + i])
    return client_user_registry


def print_mapping(mapping):
    print("""
+-------------------------
| Mapping: User -> Client
+-------------------------""")
    for client, users in mapping.items():
        print(f" - Client: {client}\t =>\t Users ({len(users)}): {', '.join(users)}")


# users that need to be assigned
list_of_users = ["user_id1", "user_id2", "user_id3", "user_id4", "user_id5", "user_id6", "user_id7", "user_id8",
                 "user_id9", "user_id10", "user_id11",
                 "user_id12", "user_id13", "user_id14", "user_id15", "user_id16", "user_id17", "user_id18",
                 "user_id19",
                 "user_id20", "user_id21", "user_id22", "user_id23", "user_id24", "user_id25", "user_id26"]
# clients to assign users to
list_of_clients = ["client_1", "client_2", "client_3", "client_4", "client_5", "client_6", "client_7"]

# do assignment of users to clients
client_user_assignment = assign_users_to_clients(list_of_users, list_of_clients)

# verify that the algorithm works (just for demo purposes)
verify_all_users_assigned(list_of_users, client_user_assignment)

# print assignment
print_mapping(client_user_assignment)

Expected output预计 output

An equal split is not possible! 5 users would remain when each client takes on 3 users. Assigning remaining users to random clients.
All users have successfully been assigned to clients.

+-------------------------
| Mapping: User -> Client
+-------------------------
 - Client: client_1  =>  Users (4): user_id1, user_id2, user_id3, user_id23
 - Client: client_2  =>  Users (4): user_id4, user_id5, user_id6, user_id26
 - Client: client_3  =>  Users (3): user_id7, user_id8, user_id9
 - Client: client_4  =>  Users (3): user_id10, user_id11, user_id12
 - Client: client_5  =>  Users (4): user_id13, user_id14, user_id15, user_id24
 - Client: client_6  =>  Users (4): user_id16, user_id17, user_id18, user_id25
 - Client: client_7  =>  Users (4): user_id19, user_id20, user_id21, user_id22

Please note: as random.sample() chooses the clients that take on one more client randomly your result might differ, but it will always be fair (= see specification of fair above)请注意:由于random.sample()会随机选择接受更多客户的客户,您的结果可能会有所不同,但它始终是公平的(=参见上面的公平规范)

With external libraries使用外部库

When using external libraries there are many options.使用外部库时有很多选择。 See eg function pandas.cut() or numpy.split() .参见例如 function pandas.cut()numpy.split() They will act differently when a fair distribution of users to clients is not possible so you should read on that in the documentation.当无法将用户公平分配给客户时,它们的行为会有所不同,因此您应该在文档中阅读相关内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM