Say I have a pandas dataframe with data like this:
item diff otherstuff
0 1 2 1
1 1 1 2
2 1 3 7
3 2 -1 0
4 2 1 3
5 2 4 9
6 2 -6 2
7 3 0 0
8 3 2 9
Is it possible to compare all the rows that have the same item and keep only the item that has the lowest diff?
So this table would end up as:
item diff otherstuff
0 1 1 2
1 2 -6 2
2 3 0 0
Assuming I won't always know what order or what the items will be called.
I've tried some really convoluted for loops trying to get the number of items that were the same, then going through that index to compare and dropping all but the lowest row from the dataframe, but that didn't seem to work. How else would I go about doing this?
For this you can use groupby
:
>>> df.groupby("item", as_index=False)["diff"].min()
item diff
0 1 1
1 2 -6
2 3 0
[3 rows x 2 columns]
This groups by item
, as_index=False
means that you want grouped output looking more like the original, ["diff"]
selects the diff
column, and min()
says that we want the minimum value.
Reading through the groupby section of the docs would probably be helpful, as there's a lot of neat stuff you can do once you get the hang of it.
[Note that things can become a little more complicated if you want to keep multiple rows in case of multiple equal minimum values, but you can still pull it off.]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.