This is my data frame with 5 rows and 3 columns:
df = pd.DataFrame({'A': [1,4,4,3,7], 'B': [1,2,2,6,4], 'C': [1,2,2,6,4]})
I have to find a way to drop a row if the datapoint in column A is finding a value higher than itself in column A, and the B value of that row is lower than the B value of the querying row.
For example, in the above data frame row 4 has to be dropped because it has higher values in column A (4, 7) with less B value (2,4).
I will modify the question in a application perspective for better clarity. Sorry for my bad presentation skills.
Lets say this is our dataframe
df = pd.DataFrame({'resources': [100,200,300,300,400,400,400,500,1000],
'score': [1,2,1,2,3,5,6,8,9]})
I want to find a trade-off with resources i use and my score. My priority is to get the best score with less resources. I iterate all combinations and see if a row is eligible to be considered. So basically in this 9 rows, rows 3, 4, 5,6 should be eliminated 3 because 1 gives the same score with less resource, 4 because 2 gives the same score with less resource, 5 and 6 because 7 gives a better score with same resource. I hope this will make my problem more clear.
Starting from:
A B C
0 1 1 1
1 4 2 2
2 4 2 2
3 3 6 6
4 7 4 4
You can do:
mask_a = df.A.apply(lambda x: (df.A > x).any())
mask_b = df.B.apply(lambda x: (df.B <= x).all())
df[~(mask_a & mask_b)]
# Output
A B C
0 1 1 1
1 4 2 2
2 4 2 2
4 7 4 4
I want to find a trade-off with resources i use and my score. My priority is to get the best score with less resources
Ok, then I'd do this:
df["ratio"] = df.score / df.resources
df.sort_values("ratio", ascending=False)
Which gives you:
resources score ratio
7 500 8 0.016000
6 400 6 0.015000
5 400 5 0.012500
0 100 1 0.010000
1 200 2 0.010000
8 1000 9 0.009000
4 400 3 0.007500
3 300 2 0.006667
2 300 1 0.003333
Now you can see the best trade-offs at the top and worst ones at the bottom.
IIUC, you want to get the max score per resource and the min resource per score simultaneously.
You can compute both and get the intersection:
import numpy as np
idx1 = df.groupby('resources')['score'].idxmax().values
# array([0, 1, 3, 6, 7, 8])
idx2 = df.groupby('score')['resources'].idxmin().values
# array([0, 1, 4, 5, 6, 7, 8])
out = df.loc[np.intersect1d(idx1, idx2)]
output:
resources score
0 100 1
1 200 2
6 400 6
7 500 8
8 1000 9
used input:
df = pd.DataFrame({'resources': [100,200,300,300,400,400,400,500,1000],
'score': [1,2,1,2,3,5,6,8,9]})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.