简体   繁体   English

Dataframe 中的 Select 行包含具有最大行数的日期

[英]Select rows in Dataframe containing dates with maximum number of rows

I have a Dataframe, HistDf5, with a datetime index and 4 columns.我有一个 Dataframe,HistDf5,有一个日期时间索引和 4 列。 I would like to count the number of rows per date (ie the number of "times") within each date and select those dates with the maximum number of "times".我想计算每个日期内每个日期的行数(即“次数”)和 select 那些具有最大“次数”数的日期。

                     OPEN  CLOSE  HIGH    LOW        VOL
DTYYYYMMDD                                              
2011-01-02 18:00:00  0.00   1.25  1.50  -0.75  24.907415
2011-01-02 18:05:00  1.25   0.50  1.75   0.25  25.743008
2011-01-02 18:10:00  0.25   0.00  0.50   0.00  22.310852
2011-01-02 18:15:00  0.25   0.75  0.75   0.25  21.303043
2011-01-02 18:20:00  0.75   0.25  0.75   0.00  15.431916
                  ...    ...   ...    ...        ...
2014-06-24 23:35:00 -9.75  -9.50 -9.50 -10.00  16.471735
2014-06-24 23:40:00 -9.50  -9.50 -9.50 -10.00  18.634443
2014-06-24 23:45:00 -9.75  -9.50 -9.50 -10.00  13.974959
2014-06-24 23:50:00 -9.50  -9.75 -9.50  -9.75  12.305773
2014-06-24 23:55:00 -9.50  -9.75 -9.50  -9.75  15.471089

[365544 rows x 5 columns]

Calculating the number of entries per dates is straightforward.计算每个日期的条目数很简单。 However, once I have the dates that I want, I don't know how to slice the Dataframe to select only those dates that have the maximum number of "times".但是,一旦我有了我想要的日期,我不知道如何将 Dataframe 切片到 select 仅具有最大“次数”的那些日期。

CountDF = HistDf5.groupby(HistDf5.index.date)['VOL'].count()
IndxLst = CountDF[CountDF == CountDF.max()].index
HistDf5 = HistDf5.loc[IndxLst]


            OPEN  CLOSE  HIGH   LOW        VOL
2011-01-03   0.0   0.25  0.25  0.00   5.598422
2011-01-04   0.0   0.00  0.25  0.00   5.375278
2011-01-05   0.0   0.00  0.25  0.00   9.965758
2011-01-06   0.0  -0.25  0.00 -0.50  12.894489
2011-01-07   0.0   0.00  0.00 -0.25   3.871201
         ...    ...   ...   ...        ...
2014-06-20   0.0   0.00  0.00 -0.25  11.530156
2014-06-21   NaN    NaN   NaN   NaN   0.000000
2014-06-22   NaN    NaN   NaN   NaN   0.000000
2014-06-23   0.0   0.25  0.25  0.00   4.499810
2014-06-24   0.0   0.25  0.25  0.00  14.659017

[1269 rows x 5 columns]

If I understand correctly you can use transform and then select the max count.如果我理解正确,您可以使用 transform 然后 select 最大计数。

df['COUNT'] = df.groupby(df.index.date)['VOL'].transform('count')
df.loc[df['COUNT'] == df['COUNT'].max()]


| DTYYYYMMDD          | OPEN  | CLOSE | HIGH | LOW   | VOL       | COUNT |
|---------------------|-------|-------|------|-------|-----------|-------|
| 2011-01-02 18:00:00 | 0     | 1.25  | 1.5  | -0.75 | 24.907415 | 5     |
| 2011-01-02 18:05:00 | 1.25  | 0.5   | 1.75 | 0.25  | 25.743008 | 5     |
| 2011-01-02 18:10:00 | 0.25  | 0     | 0.5  | 0     | 22.310852 | 5     |
| 2011-01-02 18:15:00 | 0.25  | 0.75  | 0.75 | 0.25  | 21.303043 | 5     |
| 2011-01-02 18:20:00 | 0.75  | 0.25  | 0.75 | 0     | 15.431916 | 5     |
| 2014-06-24 23:35:00 | -9.75 | -9.5  | -9.5 | -10   | 16.471735 | 5     |
| 2014-06-24 23:40:00 | -9.5  | -9.5  | -9.5 | -10   | 18.634443 | 5     |
| 2014-06-24 23:45:00 | -9.75 | -9.5  | -9.5 | -10   | 13.974959 | 5     |
| 2014-06-24 23:50:00 | -9.5  | -9.75 | -9.5 | -9.75 | 12.305773 | 5     |
| 2014-06-24 23:55:00 | -9.5  | -9.75 | -9.5 | -9.75 | 15.471089 | 5     |

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM