Eliminate rows in a dataframe after looping through all rows of column and applying condition

Question

This is my data frame with 5 rows and 3 columns:

df = pd.DataFrame({'A': [1,4,4,3,7], 'B': [1,2,2,6,4], 'C': [1,2,2,6,4]})

I have to find a way to drop a row if the datapoint in column A is finding a value higher than itself in column A, and the B value of that row is lower than the B value of the querying row.

For example, in the above data frame row 4 has to be dropped because it has higher values in column A (4, 7) with less B value (2,4).

I will modify the question in a application perspective for better clarity. Sorry for my bad presentation skills.

Lets say this is our dataframe

df = pd.DataFrame({'resources': [100,200,300,300,400,400,400,500,1000],
                   'score': [1,2,1,2,3,5,6,8,9]})

I want to find a trade-off with resources i use and my score. My priority is to get the best score with less resources. I iterate all combinations and see if a row is eligible to be considered. So basically in this 9 rows, rows 3, 4, 5,6 should be eliminated 3 because 1 gives the same score with less resource, 4 because 2 gives the same score with less resource, 5 and 6 because 7 gives a better score with same resource. I hope this will make my problem more clear.

Answer 1

Starting from:

    A   B   C
0   1   1   1
1   4   2   2
2   4   2   2
3   3   6   6
4   7   4   4

You can do:

mask_a = df.A.apply(lambda x: (df.A > x).any())
mask_b = df.B.apply(lambda x: (df.B <= x).all())

df[~(mask_a & mask_b)]

# Output
    A   B   C
0   1   1   1
1   4   2   2
2   4   2   2
4   7   4   4

Answer 2

I want to find a trade-off with resources i use and my score. My priority is to get the best score with less resources

Ok, then I'd do this:

df["ratio"] = df.score / df.resources
df.sort_values("ratio", ascending=False)

Which gives you:

    resources   score   ratio
7   500         8       0.016000
6   400         6       0.015000
5   400         5       0.012500
0   100         1       0.010000
1   200         2       0.010000
8   1000        9       0.009000
4   400         3       0.007500
3   300         2       0.006667
2   300         1       0.003333

Now you can see the best trade-offs at the top and worst ones at the bottom.

Answer 3

IIUC, you want to get the max score per resource and the min resource per score simultaneously.

You can compute both and get the intersection:

import numpy as np

idx1 = df.groupby('resources')['score'].idxmax().values
# array([0, 1, 3, 6, 7, 8])

idx2 = df.groupby('score')['resources'].idxmin().values
# array([0, 1, 4, 5, 6, 7, 8])

out = df.loc[np.intersect1d(idx1, idx2)]

output:

   resources  score
0        100      1
1        200      2
6        400      6
7        500      8
8       1000      9

used input:

df = pd.DataFrame({'resources': [100,200,300,300,400,400,400,500,1000],
                   'score': [1,2,1,2,3,5,6,8,9]})

Eliminate rows in a dataframe after looping through all rows of column and applying condition

Question

3 answers

solution1
0 2022-06-02 09:28:16

solution2
0 2022-06-03 22:16:39

solution3
-1 2022-06-02 15:14:36

Eliminate rows in a dataframe after looping through all rows of column and applying condition

Question

3 answers

solution1 0 2022-06-02 09:28:16

solution2 0 2022-06-03 22:16:39

solution3 -1 2022-06-02 15:14:36

solution1
0 2022-06-02 09:28:16

solution2
0 2022-06-03 22:16:39

solution3
-1 2022-06-02 15:14:36