Python / Pandas / Pulp Optimization Duplicates

Question

I am trying to optimize a grouping / selection of trial members with limited space, and am running into some trouble. I have the pandas data frames ready for optimization, and can run the linear optimization with no problems, except for one constraint I need to add. I am trying to use binaries for selection (but I am not tied to that for any reason, so if a different method would resolve this, I could switch) from a large list. I need to minimize combined trial time for selection in the next round of trials, but some subjects already ran multiple trials due to the nature of the experiment. I would like to select the best combination of subjects based on minimizing time, but allow some subjects to be in the list multiple times for the optimization (so I do not have to manually remove them beforehand). For instance:

Name         Trial    ID       Time (ms)    Selected?
Mary Smith   A        11001    33           1
John Doe     A        11002    24           0
James Smith  B        11003    52           0
Stacey Doe   A        11004    21           1
John Doe     B        11002    19           1

Is there some way to allow 2 John Doe entries for the optimization but constrain the output to only one selection of him? Thanks for your time!

Answer 1

If you have a requirement to record all the values you want to remove, you could use the duplicated function, like this

# First sort your dataframe
df.sort_values(['Name', 'Time (ms)'], inplace=True)

# Make a new column of duplicated values based only on name
df['duplicated'] = df.duplicated(subset=['Name'])

# You can then access the duplicates, but still have a log of the rejects
df.query('not duplicated')
#           Name Trial     ID  Time (ms)  Selected?  duplicated
# 2  James Smith     B  11003         52          0       False
# 1     John Doe     A  11002         24          0       False
# 0   Mary Smith     A  11001         33          1       False
# 3   Stacey Doe     A  11004         21          1       False

df.query('duplicated')
#        Name Trial     ID  Time (ms)  Selected?  duplicated
# 4  John Doe     B  11002         19          1        True

Python / Pandas / Pulp Optimization Duplicates

Question

1 answers

solution1
0 2018-06-25 23:27:27

Python / Pandas / Pulp Optimization Duplicates

Question

1 answers

solution1 0 2018-06-25 23:27:27

solution1
0 2018-06-25 23:27:27