I'm trying to filter and sort a Pandas dataframe to clean my data. I've looked on StackOverflow and can't seem to find a method that will give me the sort and filter I need. The data I'm working with looks something like this:
| Name 1 | Name 2 | Score |
| ------ | ------ | ----- |
| Amy | Jack | 2.456 |
| Amy | Jack | 3.234 |
| Amy | Jack | 5.124 |
| ... | ... | ... |
| Max | Jane | 8.569 |
| Max | Jane | 4.654 |
| Max | Jane | 6.349 |
What I want to do make a new dataframe out of the lowest score of every pair of names. So the resulting dataframe would be something like this:
| Name 1 | Name 2 | Score |
| ------ | ------ | ----- |
| Amy | Jack | 2.456 |
| ... | ... | ...|
| Max | Jane | 4.654 |
Use:
df = df.groupby(['Name 1', 'Name 2'], as_index = False).agg(Score = ('Score', 'min'))
Output:
>>> df
Name1 Name2 Score
0 Amy Jack 2.456
1 Max Jane 4.654
You can also use sort_values()
and groupby()
method:
df.sort_values(by='Score').groupby(['Name 1', 'Name 2'], as_index = False).first()
OR
Use sort_values()
and drop_duplicates()
method:
df.sort_values(by='Score').drop_duplicates(subset=['Name 1', 'Name 2'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.