With the following DataFrame...
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Distorted Dream 2012-10-24 BEL 4 2
Distorted Dream 2012-09-12 BEL 2 3
Distorted Dream 2012-06-30 BEL 8 4
Distorted Dream 2012-06-09 BEL 2 4
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary 2013-05-27 BEL 6 5
In the Dark 2013-10-13 BEL 5 7
In the Dark 2013-09-22 BEL 5 7
In the Dark 2013-08-03 SAR 2 7
In the Dark 2012-11-24 AQU 3 7
In the Dark 2012-10-18 BEL 6 6
Bred to Boss 2013-10-26 PRX 3 5
Bred to Boss 2013-10-06 PRX 6 3
Bred to Boss 2012-08-18 SAR 4 1
...the index is set to horse_name
. I need to "trim" each of these to a certain number. For example, "Distorted Dream" has seven records. I need to reduce all of those with more than say three records down to three, so it produces a DataFrame like the one below. Is there a quick an easy way to do this?
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary 2013-05-27 BEL 6 5
In the Dark 2013-10-13 BEL 5 7
In the Dark 2013-09-22 BEL 5 7
In the Dark 2013-08-03 SAR 2 7
Bred to Boss 2013-10-26 PRX 3 5
Bred to Boss 2013-10-06 PRX 6 3
Bred to Boss 2012-08-18 SAR 4 1
As it so often is, groupby
to the rescue! It's worthwhile reading through the docs as there are lots of useful tricks one can pull.
>>> df.groupby(level=0, sort=False, as_index=False).head(3)
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary 2013-05-27 BEL 6 5
In the Dark 2013-10-13 BEL 5 7
In the Dark 2013-09-22 BEL 5 7
In the Dark 2013-08-03 SAR 2 7
Bred to Boss 2013-10-26 PRX 3 5
Bred to Boss 2013-10-06 PRX 6 3
Bred to Boss 2012-08-18 SAR 4 1
Or, if you want the last 3:
>>> df.groupby(level=0, sort=False, as_index=False).tail(3)
(The sort=False
is there just to preserve the original horse order; if you don't care about that, you can drop it.)
You could also sort on the line_date
column (safer to convert it to datetime
first, but YYYY-MM-DD
strings will sort correctly as they are) and select either the first or last three chronologically using the same head
/ tail
method.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.