简体   繁体   English

在pandas df中重新分配列值

[英]Re-assign column values in a pandas df

This question is related to rostering or staffing. 这个问题与排班或人员配备有关。 I'm trying to assign various jobs to individuals (employees). 我正在尝试为个人(员工)分配各种工作。 Using the df below, 使用下面的df

`[Person]` = Individuals (employees)
`[Area]` and `[Place]` = unique jobs
`[On]` = How many unique jobs are occurring at each point in time

So [Area] and [Place] together will make up unique values that are different jobs. 因此, [Area][Place]将组成不同作业的unique值。 These values will be assigned to individuals with the overall aim to use the least amount of individuals possible. 这些值将分配给个人,总体目标是尽可能少地使用个人。 The most unique values assigned to any one individual is 3. [On] displays how many current unique values for [Place] and [Area] are occurring. assigned给任何一个人的最独特的值是3. [On]显示[Place][Area]正在发生的当前unique值的数量。 So this provides a concrete guide on how many individuals I need. 因此,这为我需要的人数提供了具体的指导。 For example, 例如,

1-3 unique values occurring = 1 individual
4-6 unique values occurring = 2 individuals
7-9 unique values occurring = 3 individuals etc

Question: Where the amount of unique values in [Area] and [Place] is greater than 3 is causing me trouble. 问题:如果[Area][Place]unique值大于3,则会给我带来麻烦。 I can't do a groupby where I assign the first 3 unique values to individual 1 and the next 3 unique values to individual 2 etc. I want to group unique values in [Area] and [Place] by [Area] . 我不能做一个groupby ,我assign3 unique values assignindividual 1 ,接下来3个unique值分配给individual 2等。我想通过[Area] [Place] [Area][Place]的唯一值分组。 So look to assign same values in [Area] to an individual (up to 3). 因此, assign [Area]中的相同值assign个人(最多3个)。 Then, if there are leftover values (<3), they should be combined to make a group of 3, where possible. 然后,如果有剩余的值(<3),则应尽可能将它们组合成3个组。

The way I envisage this working is: see into the future by an hour . 我设想这种工作的方式是:一hour 看未来 For each new row of values the script should see how many values will be [On] (this provides an indication of how many total individuals are required). 对于每个新的值rowscript应该看到[On]多少个值(这表示需要多少个人)。 Where unique values are >3, they should be assigned by grouping the same value in [Area] . 如果unique值> 3,则应通过在[Area]对相同值进行groupingassigned它们。 If there are leftover values they should be combined anyhow to make up to a group of 3. 如果有剩余的价值,无论如何它们应该组合成一组3。

Putting that into a step by step process: 将其分为一步一步的过程:

1) Use the [On] Column to determine how many individuals are required by looking into the future for an hour 1)使用[On] Column通过展望未来hour来确定需要多少人

2) Where there are more than 3 unique values occurring assign the identical values in [Area] first. 2)如果出现超过3个unique值,则首先在[Area]指定相同的值。

3) If there are any leftover values then look to combine anyway possible. 3)如果有任何剩余的值,那么无论如何都要结合起来。

For the df below, there are 9 unique values occurring for [Place] and [Area] with an hour . 对于下面的df[Place][Area]有9个unique值,一hour So we should have 3 individuals assigned . 所以我们应该assigned 3个人。 When unique values >3 it should be assigned by [Area] and seeing if the same value occurs. unique值> 3时,它应由[Area]分配,并查看是否出现相同的值。 The leftover values should be combined with other individuals that have less than 3 unique values. 剩余的值应与其他具有少于3个unique值的个体组合。

import pandas as pd
import numpy as np

d = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],                 
    'Area' : ['A','B','C','D','E','D','E','F','G'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],   
     })

df = pd.DataFrame(data=d)

This is my attempt: 这是我的尝试:

def reduce_df(df):
    values = df['Area'] + df['Place']
    df1 = df.loc[~values.duplicated(),:] # ignore duplicate values for this part..
    person_count = df1.groupby('Person')['Person'].agg('count')
    leftover_count = person_count[person_count < 3] # the 'leftovers'

    # try merging pairs together
    nleft = leftover_count.shape[0]
    to_try = np.arange(nleft - 1)
    to_merge = (leftover_count.values[to_try] + 
                leftover_count.values[to_try + 1]) <= 3
    to_merge[1:] = to_merge[1:] & ~to_merge[:-1]
    to_merge = to_try[to_merge]
    merge_dict = dict(zip(leftover_count.index.values[to_merge+1], 
                leftover_count.index.values[to_merge]))
    def change_person(p):
        if p in merge_dict.keys():
            return merge_dict[p]
        return p
    reduced_df = df.copy()
    # update df with the merges you found
    reduced_df['Person'] = reduced_df['Person'].apply(change_person)
    return reduced_df

df1 = (reduce_df(reduce_df(df)))

This is the Output: 这是输出:

       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 4
4   8:35:00  House 5    E  5  Person 5
5   8:40:00  House 1    D  6  Person 4
6   8:42:00  House 2    E  7  Person 5
7   8:45:00  House 3    F  8  Person 5
8   8:50:00  House 2    G  9  Person 7

This is my Intended Output: 这是我的预期输出:

       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 5    E  5  Person 3
5   8:40:00  House 6    D  6  Person 2
6   8:42:00  House 2    E  7  Person 3
7   8:45:00  House 3    F  8  Person 2
8   8:50:00  House 2    G  9  Person 3

Description on how I want to get this output: 我想如何获得此输出的说明:

Index 0: One `unique` value occurring. So `assign` to individual 1
Index 1: Two `unique` values occurring. So `assign` to individual 1
Index 2: Three `unique` values occurring. So `assign` to individual 1
Index 3: Four `unique` values on. So `assign` to individual 2
Index 4: Five `unique` values on. This one is a bit tricky and hard to conceptualise. But there is another `E` within an `hour`. So `assign` to a new individual so it can be combined with the other `E`
Index 5: Six `unique` values on. Should be `assigned` with the other `D`. So individual 2
Index 6: Seven `unique` values on. Should be `assigned` with other `E`. So individual 3
Index 7: Eight `unique` values on. New value in `[Area]`, which is a _leftover_. `Assign` to either individual 2 or 3
Index 8: Nine `unique` values on. New value in `[Area]`, which is a _leftover_. `Assign` to either individual 2 or 3

Example No2: 示例No2:

d = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],                 
    'Area' : ['X','X','X','X','X','X','X','X','X'],     
    'On' : ['1','2','3','3','3','3','3','3','3'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],   
    })

    df = pd.DataFrame(data=d)

I am getting an error: 我收到一个错误:

 IndexError: index 1 is out of bounds for axis 1 with size 1

On this line: 在这一行:

df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]

However, if I change the Person to 1,2,3 repeating, it returns the following: 但是,如果我将Person更改为1,2,3重复,则返回以下内容:

'Person' : ['Person 1','Person 2','Person 3','Person 1','Person 2','Person 3','Person 1','Person 2','Person 3'], 

      Time    Place Area On    Person
0  8:03:00  House 1    X  1  Person 1
1  8:17:00  House 2    X  2  Person 1
2  8:20:00  House 3    X  3  Person 1
3  8:28:00  House 1    X  3  Person 2
4  8:35:00  House 2    X  3  Person 2
5  8:40:00  House 3    X  3  Person 2
6  8:42:00  House 1    X  3  Person 3
7  8:45:00  House 2    X  3  Person 3
8  8:50:00  House 3    X  3  Person 3

Intended Output: 预期产出:

      Time    Place Area On    Person
0  8:03:00  House 1    X  1  Person 1
1  8:17:00  House 2    X  2  Person 1
2  8:20:00  House 3    X  3  Person 1
3  8:28:00  House 1    X  3  Person 1
4  8:35:00  House 2    X  3  Person 1
5  8:40:00  House 3    X  3  Person 1
6  8:42:00  House 1    X  3  Person 1
7  8:45:00  House 2    X  3  Person 1
8  8:50:00  House 3    X  3  Person 1

The main takeaway from Example 2 is: 示例2的主要内容是:

1) There are <3 unique values on so assign to individual 1

Update 更新

There's a live version of this answer online that you can try for yourself. 这是在线答案的现场版本,您可以自己尝试。

Here's an answer in the form of the allocatePeople function. 这是allocatePeople函数形式的答案。 It's based around precomputing all of the indices where the areas repeat within an hour: 它基于预先计算区域在一小时内重复的所有指数:

from collections import Counter
import numpy as np
import pandas as pd

def getAssignedPeople(df, areasPerPerson):
    areas = df['Area'].values
    places = df['Place'].values
    times = pd.to_datetime(df['Time']).values
    maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
    assignmentCount = Counter()
    assignedPeople = []
    assignedPlaces = {}
    heldPeople = {}
    heldAreas = {}
    holdAvailable = True
    person = 0

    # search for repeated areas. Mark them if the next repeat occurs within an hour
    ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
    holds = np.zeros(areas.size, dtype=bool)
    holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')

    for area,place,hold in zip(areas, places, holds):
        if (area, place) in assignedPlaces:
            # this unique (area, place) has already been assigned to someone
            assignedPeople.append(assignedPlaces[(area, place)])
            continue

        if assignmentCount[person] >= areasPerPerson:
            # the current person is already assigned to enough areas, move on to the next
            a = heldPeople.pop(person, None)
            heldAreas.pop(a, None)
            person += 1

        if area in heldAreas:
            # assign to the person held in this area
            p = heldAreas.pop(area)
            heldPeople.pop(p)
        else:
            # get the first non-held person. If we need to hold in this area, 
            # also make sure the person has at least 2 free assignment slots,
            # though if it's the last person assign to them anyway 
            p = person
            while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
                p += 1

        assignmentCount.update([p])
        assignedPlaces[(area, place)] = p
        assignedPeople.append(p)

        if hold:
            if p==maxPerson:
                # mark that there are no more people available to perform holds
                holdAvailable = False

            # this area recurrs in an hour, mark that the person should be held here
            heldPeople[p] = area
            heldAreas[area] = p

    return assignedPeople

def allocatePeople(df, areasPerPerson=3):
    assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
    df = df.copy()
    df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
    return df

Note the use of df['Person'].unique() in allocatePeople . 注意在allocatePeople使用df['Person'].unique() That handles the case where people are repeated in the input. 这处理了人们在输入中重复的情况。 It is assumed that the order of people in the input is the desired order in which those people should be assigned. 假设输入中的人的顺序是应该分配那些人的期望顺序。

I tested allocatePeople against the OP's example input ( example1 and example2 ) and also against a couple of edge cases I came up with that I think(?) match the OP's desired algorithm: 我针对OP的示例输入( example1example2 )测试了allocatePeople ,并针对我想出的几个边缘情况,我认为(?)匹配OP的所需算法:

ds = dict(
example1 = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],                 
    'Area' : ['A','B','C','D','E','D','E','F','G'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],   
    }),
example2 = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],                 
    'Area' : ['X','X','X','X','X','X','X','X','X'],     
    'On' : ['1','2','3','3','3','3','3','3','3'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],   
    }),

long_repeats = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:25:00','8:30:00','8:31:00','8:35:00','8:45:00','8:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 3','House 2'],                 
    'Area' : ['A','A','A','A','B','C','C','C','B'],  
    'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 4','Person 4','Person 3'],   
    'On' : ['1','2','3','4','5','6','7','8','9'],                      
    }),
many_repeats = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 2'],                 
    'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'F'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
    }),
large_gap = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 3'],                 
    'Area' : ['A', 'B', 'C', 'D', 'E', 'F', 'D', 'D', 'D'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
    }),
different_times = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','09:42:00','09:45:00','09:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 1'],                 
    'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'G'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
    })
)

expectedPeoples = dict(
    example1 = [1,1,1,2,3,2,3,2,3],
    example2 = [1,1,1,1,1,1,1,1,1],
    long_repeats = [1,1,1,2,2,3,3,3,2],
    many_repeats = [1,1,1,2,2,3,3,2,3],
    large_gap = [1,1,1,2,3,3,2,2,3],
    different_times = [1,1,1,2,2,2,3,3,3],
)

for name,d in ds.items():
    df = pd.DataFrame(d)
    expected = ['Person %d' % i for i in expectedPeoples[name]]
    ap = allocatePeople(df)

    print(name, ap, sep='\n', end='\n\n')
    np.testing.assert_array_equal(ap['Person'], expected)

The assert_array_equal statements pass, and the output matches OP's expected output: assert_array_equal语句通过,输出与OP的预期输出匹配:

example1
       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 5    E  5  Person 3
5  08:40:00  House 1    D  6  Person 2
6  08:42:00  House 2    E  7  Person 3
7  08:45:00  House 3    F  8  Person 2
8  08:50:00  House 2    G  9  Person 3

example2
      Time    Place Area On    Person
0  8:03:00  House 1    X  1  Person 1
1  8:17:00  House 2    X  2  Person 1
2  8:20:00  House 3    X  3  Person 1
3  8:28:00  House 1    X  3  Person 1
4  8:35:00  House 2    X  3  Person 1
5  8:40:00  House 3    X  3  Person 1
6  8:42:00  House 1    X  3  Person 1
7  8:45:00  House 2    X  3  Person 1
8  8:50:00  House 3    X  3  Person 1

The output for my test cases matches my expectations as well: 我的测试用例的输出也符合我的期望:

long_repeats
      Time    Place Area    Person On
0  8:03:00  House 1    A  Person 1  1
1  8:17:00  House 2    A  Person 1  2
2  8:20:00  House 3    A  Person 1  3
3  8:25:00  House 4    A  Person 2  4
4  8:30:00  House 1    B  Person 2  5
5  8:31:00  House 1    C  Person 3  6
6  8:35:00  House 2    C  Person 3  7
7  8:45:00  House 3    C  Person 3  8
8  8:50:00  House 2    B  Person 2  9

many_repeats
       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 1    D  5  Person 2
5  08:40:00  House 1    E  6  Person 3
6  08:42:00  House 2    E  7  Person 3
7  08:45:00  House 1    F  8  Person 2
8  08:50:00  House 2    F  9  Person 3

large_gap
       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 1    E  5  Person 3
5  08:40:00  House 1    F  6  Person 3
6  08:42:00  House 2    D  7  Person 2
7  08:45:00  House 1    D  8  Person 2
8  08:50:00  House 3    D  9  Person 3

different_times
       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 1    D  5  Person 2
5  08:40:00  House 1    E  6  Person 2
6  09:42:00  House 2    E  7  Person 3
7  09:45:00  House 1    F  8  Person 3
8  09:50:00  House 1    G  9  Person 3

Let me know if it does everything you wanted, or if it still needs some tweaks. 如果它能完成你想要的一切,或者它是否还需要一些调整,请告诉我。 I think everyone is eager to see you fulfill your vision. 我想每个人都渴望看到你实现你的愿景。

Ok, before we delve into the logic of the problem it is worthwhile to do some housekeeping to tidy-up the data and bring it into a more useful format: 好的,在我们深入研究问题的逻辑之前,有必要做一些内务处理来整理数据并将其带入更有用的格式:

#Create table of unique people
unique_people = df[['Person']].drop_duplicates().sort_values(['Person']).reset_index(drop=True)

#Reformat time column
df['Time'] = pd.to_datetime(df['Time'])

Now, getting to the logic of the problem, it is useful to break the problem down in to stages. 现在,了解问题的逻辑,将问题分解为阶段是有用的。 Firstly, we will want to create individual jobs (with job numbers) based on the 'Area' and the time between them. 首先,我们希望根据“区域”和它们之间的时间创建单独的工作(带有工作号码)。 ie jobs in the same area, within an hour can share the same job number. 即同一区域内的工作,一小时内可以共享相同的工作号。

#Assign jobs
df= df.sort_values(['Area','Time']).reset_index(drop=True)
df['Job no'] = 0
current_job = 1   
df.loc[0,'Job no'] = current_job
for i in range(rows-1):
    prev_row = df.loc[i]
    row = df.loc[i+1]
    time_diff = (row['Time'] - prev_row['Time']).seconds //3600
    if (row['Area'] == prev_row['Area'])  & (time_diff == 0):
        pass
    else:
        current_job +=1
    df.loc[i+1,'Job no'] = current_job

With this step now out of the way, it is a simple matter of assigning 'Persons' to individual jobs: 现在已经完成了这一步骤,将“人员”分配给各个工作是一件简单的事情:

df= df.sort_values(['Job no']).reset_index(drop=True)
df['Person'] = ""
df_groups = df.groupby('Job no')
for group in df_groups:
    group_size = group[1].count()['Time']
    for person_idx in range(len(unique_people)):
        person = unique_people.loc[person_idx]['Person']
        person_count = df[df['Person']==person]['Person'].count()
        if group_size <= (3-person_count):
            idx = group[1].index.values
            df.loc[idx,'Person'] = person
            break

And finally, 最后,

df= df.sort_values(['Time']).reset_index(drop=True)
print(df)

I've attempted to code this in a way that is easier to unpick, so there may well be efficiencies to be made here. 我试图以一种更容易取消的方式对其进行编码,因此这里可能会有很高的效率。 The aim however was to set out the logic used. 然而,目的是列出所使用的逻辑。

This code gives the expected results on both data sets, so I hope it answers your question. 这段代码给出了两个数据集的预期结果,所以我希望它能回答你的问题。

In writing my other answer , I slowly came around to the idea that the OP's algorithm might be easier to implement with an approach that focuses on the jobs (which can be different), instead of the people (which are all the same). 在写我的另一个答案时 ,我慢慢地想到OP的算法可能更容易实现,一种方法专注于工作(可能是不同的),而不是人(都是相同的)。 Here's a solution that uses the job-centric approach: 这是一个使用以工作为中心的方法的解决方案:

from collections import Counter
import numpy as np
import pandas as pd

def assignJob(job, assignedix, areasPerPerson):
    for i in range(len(assignedix)):
        if (areasPerPerson - len(assignedix[i])) >= len(job):
            assignedix[i].extend(job)
            return True
    else:
        return False

def allocatePeople(df, areasPerPerson=3):
    areas = df['Area'].values
    times = pd.to_datetime(df['Time']).values
    peopleUniq = df['Person'].unique()
    npeople = int(np.ceil(areas.size / float(areasPerPerson)))

    # search for repeated areas. Mark them if the next repeat occurs within an hour
    ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
    holds = np.zeros(areas.size, dtype=bool)
    holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')

    jobs =[]
    _jobdict = {}
    for i,(area,hold) in enumerate(zip(areas, holds)):
        if hold:
            _jobdict[area] = job = _jobdict.get(area, []) + [i]
            if len(job)==areasPerPerson:
                jobs.append(_jobdict.pop(area))
        elif area in _jobdict:
            jobs.append(_jobdict.pop(area) + [i])
        else:
            jobs.append([i])
    jobs.sort()

    assignedix = [[] for i in range(npeople)]
    for job in jobs:
        if not assignJob(job, assignedix, areasPerPerson):
            # break the job up and try again
            for subjob in ([sj] for sj in job):
                assignJob(subjob, assignedix, areasPerPerson)

    df = df.copy()
    for i,aix in enumerate(assignedix):
        df.loc[aix, 'Person'] = peopleUniq[i]
    return df

This version of allocatePeople has also been extensively tested and passes all of the same checks described in my other answer. 此版本的allocatePeople也经过了广泛测试,并通过了我在其他答案中描述的所有相同检查。

It does have more looping than my other solution, so it is likely to be slightly less efficient (though it'll only matter if your dataframe is very large, say 1e6 rows and up). 它确实比我的其他解决方案有更多的循环,因此它的效率可能略低(尽管只有你的数据帧非常大,比如说1e6行以上)。 On the other hand, it is somewhat shorter and, I think, more straightforward and easy to understand. 另一方面,它更短,而且我认为更简单易懂。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM