Compare cells of dataframe where another cell is same for multiple columns?

Question

Say I have a pandas dataframe with data like this:

    item    diff   otherstuff
   0   1       2            1
   1   1       1            2
   2   1       3            7
   3   2      -1            0
   4   2       1            3
   5   2       4            9
   6   2      -6            2
   7   3       0            0
   8   3       2            9

Is it possible to compare all the rows that have the same item and keep only the item that has the lowest diff?

So this table would end up as:

    item   diff  otherstuff
   0   1      1           2
   1   2     -6           2
   2   3      0           0

Assuming I won't always know what order or what the items will be called.

I've tried some really convoluted for loops trying to get the number of items that were the same, then going through that index to compare and dropping all but the lowest row from the dataframe, but that didn't seem to work. How else would I go about doing this?

Answer 1

For this you can use groupby :

>>> df.groupby("item", as_index=False)["diff"].min()
   item  diff
0     1     1
1     2    -6
2     3     0

[3 rows x 2 columns]

This groups by item , as_index=False means that you want grouped output looking more like the original, ["diff"] selects the diff column, and min() says that we want the minimum value.

Reading through the groupby section of the docs would probably be helpful, as there's a lot of neat stuff you can do once you get the hang of it.

[Note that things can become a little more complicated if you want to keep multiple rows in case of multiple equal minimum values, but you can still pull it off.]

Compare cells of dataframe where another cell is same for multiple columns?

Question

1 answers

solution1
3 ACCPTED 2014-04-30 16:00:52

Compare cells of dataframe where another cell is same for multiple columns?

Question

1 answers

solution1 3 ACCPTED 2014-04-30 16:00:52

solution1
3 ACCPTED 2014-04-30 16:00:52