Matching participant with PuLP optimization: how to set constraints for a unique solution?

Question

For my research, I have to match participants in 'pairs' of three, according to their age gender and sports experience. I am using pulp to find an optimized match, that minimizes the variation per matched group in age gender and experience. However, when I run the code below I see in my output that some participants are matched to themselves. How can I set the constraints so that all participants in a group are unique and all participants in the original data are assigned to a group?

data= pd.DataFrame({'name':  ['John', 'Leo','Karin','Daniel','Claire','Alex'], 'gender': [0,0,1,0,1,1] , 'age': [24,60,38,42,28,55], 'experience': [5,4,1,4,2,2]})
data.head()

#define the optimization problem (minimize variation in participant characteristics)
prob = LpProblem("Matching_Participants", LpMinimize)

participants = range(len(data))

#Use LpVariables to create variable y, a decision variable to determine whether or not to match participant i with participant j and participant k.
y = LpVariable.dicts("trio", [(i,j,k) for i in participants for j in participants for k in participants] ,cat='Binary')


prob += lpSum(  [  np.mean([np.std([data['gender'][i],data['gender'][j],data['gender'][k]]), np.std([data['age'][i],data['age'][j],data['age'][k]]),  np.std([data['experience'][i],data['experience'][j],data['experience'][k]]) ] )   * y[(i,j,k)] for i in participants for j in participants for k in participants])

#define constraints
for i in participants:
    prob += lpSum(y[(i,j,k)] for j in participants for k in participants) <= 2 #i is not paired with more than two participants j and k
    prob += lpSum(y[(j,i,k)] for j in participants for k in participants) <= 2 #j is not paired with more than two participants i and k
    prob += lpSum(y[(i,j,k)] + y[(j,i,k)] for j in participants for k in participants) <= 1 #pairing must go both ways
prob += lpSum(y[(i,j,k)] for i in participants for j in participants for k in participants) == 2 #there is a total of 2 paires


#solve the problem
prob.solve()

#print matches
print("Finished matching!\n")
for i in participants:
    for j in participants:
        for k in participants:

            if y[(i,j,k)].varValue == 1:
                print('{} and {} and {} with a mean std of {}'.format(data['name'][i],data['name'][j],data['name'][k],(np.mean([np.std([data['gender'][i],data['gender'][j],data['gender'][k]]), np.std([data['age'][i],data['age'][j],data['age'][k]]),  np.std([data['experience'][i],data['experience'][j],data['experience'][k]]) ] ))))

Output:

Finished matching!

Leo and Alex and Leo with a mean std of 1.2570787221094177
Karin and Daniel and Karin with a mean std of 1.2570787221094177

Answer 1

Add constraints:

for i in participants:
    prob += lpSum(y[(i,j,k)] + y[(j,i,k)] + y[(j,k,i)] for j in participants for k in participants) <= 1

Answer 2

data= pd.DataFrame({'name':  ['John', 'Leo','Karin','Daniel','Claire','Alex'], 'gender': [0,0,1,0,1,1] , 'age': [24,60,38,42,28,55], 'experience': [5,4,1,4,2,2]})
data.head()

#define the optimization problem (minimize variation in participant characteristics)
prob = LpProblem("Matching_Participants", LpMinimize)

participants = range(len(data))

#Use LpVariables to create variable y, a decision variable to determine whether or not to match participant i with participant j and participant k.
#(1)
y = LpVariable.dicts("trio", [(i,j,k) for i in participants for j in participants for k in participants if i != j and i != k and j != k] ,cat='Binary')


prob += lpSum(  [  np.mean([np.std([data['gender'][i],data['gender'][j],data['gender'][k]]), np.std([data['age'][i],data['age'][j],data['age'][k]]),  np.std([data['experience'][i],data['experience'][j],data['experience'][k]]) ] )   * y[(i,j,k)] for i in participants for j in participants for k in participants if i != j and i != k and j != k])

#define constraints
#(2)
for i in participants:
    prob += lpSum(y[(i,j,k)] for j in participants if i != j for k in participants if i != k and j != k ) <= 1 #i is not paired with more than two participants j and k
    prob += lpSum(y[(j,i,k)] for j in participants if i != j for k in participants if i != k and j != k) <= 1 #j is not paired with more than two participants i and k
    prob += lpSum(y[(j,k,i)] for j in participants if i != j for k in participants if i != k and j != k) <= 1 #j is not paired with more than two participants i and j
prob += lpSum(y[(i,j,k)] for i in participants for j in participants for k in participants if i != j and i != k and j != k ) == 2 #there is a total of 2 paires

#only one participant per group and the participant can not be a member of multiple groups
#(3)
for i in participants:
    prob += lpSum(y[(i,j,k)] for j in participants for k in participants if i != k and j != k and i != j)  +  lpSum(y[(j,i,k)] for j in participants for k in participants if i != k and j != k and i != j) + lpSum(y[(j,k,i)] for j in participants for k in participants if i != k and j != k and i != j) == 1

#solve the problem
prob.solve()

#print matches
print("Finished matching!\n")
for i in participants:
    for j in participants:
        for k in participants:
            if i != j and i != k and j != k:
                if y[(i,j,k)].varValue == 1:
                    print('{} and {} and {} with a mean std of {}'.format(data['name'][i],data['name'][j],data['name'][k],(np.mean([np.std([data['gender'][i],data['gender'][j],data['gender'][k]]), np.std([data['age'][i],data['age'][j],data['age'][k]]),  np.std([data['experience'][i],data['experience'][j],data['experience'][k]]) ] ))))

prob.writeLP('stacko')

There are 3 things that should be adressed

the decision variable should be defined such that participants can not be in a group with them selves. if you use the prob.writeLp function in your original solution you can see that there decision variables called trio(0,0,0) which means john can be in a group with john and john. that doesn't makes sense
The constraints should be constructed such that participants match the decision variables
if we only have point 1 and 2 added to the model a feasible solution could be. group1: Karin, Claire and Daniel & group2: Claire, Daniel and Karin. So the same participants in two different groups, try commenting out (3) and check for your self. One way to work around this is to add a constraint that only allow for participants to appear once in any solution

Matching participant with PuLP optimization: how to set constraints for a unique solution?

Question

2 answers

solution1
0 2021-10-14 13:14:44

solution2
0 2021-10-18 13:06:01

Matching participant with PuLP optimization: how to set constraints for a unique solution?

Question

2 answers

solution1 0 2021-10-14 13:14:44

solution2 0 2021-10-18 13:06:01

solution1
0 2021-10-14 13:14:44

solution2
0 2021-10-18 13:06:01