简体   繁体   中英

Find times occurrences in python

I have a DataFrame from cvs.

I would like to know at what time there is greater probability to have the row of column "First" to 0 between 23:00 to 23:50.

                      Date First Second
0      2019-01-09 22:59:00     0     20
1      2019-01-09 23:04:00    14     32
2      2019-01-09 23:10:00     9     27
3      2019-01-09 23:11:00     7     27
4      2019-01-09 23:12:00     7     26
5      2019-01-09 23:13:00     7     26
6      2019-01-09 23:14:00     7     25
7      2019-01-09 23:15:00     6     25
8      2019-01-09 23:16:00     5     23
9      2019-01-09 23:17:00     4     22
10     2019-01-09 23:18:00     3     22
...                    ...   ...    ...
134761 2019-05-05 21:20:00    18     36
134762 2019-05-05 21:21:00    16     35
134763 2019-05-05 21:22:00    15     34
134764 2019-05-05 21:23:00    14     33

I use this code for select the time expected :

heure = df.set_index('Date').between_time('23:00:00','23:50:00')

But I wasn't able to extract just time.

If you have any suggestions :)

Thanks,

Robin

How about using dt accessors? Updated with an end to end example for your use case.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        'date': [
            pd.to_datetime('2019-01-09 22:59:00'),
            pd.to_datetime('2019-01-09 23:00:00'),
            pd.to_datetime('2019-01-09 23:49:59'),
            pd.to_datetime('2019-01-09 23:50:00'),
            pd.to_datetime('2019-01-09 23:51:00'),
        ],
        'value': [0, 0, 5, 6, 1]
    }        
)

# A mask to split the datset into two groups, based on the time.

df['in_range'] = np.where((df['date'].dt.hour == 23) & (df['date'].dt.minute < 50), 'In Range', 'Out of Range')

# A column that tests the condition you mentioned

df['condition'] = df['value'] == 0

# Group and get the average, which is the likelihood that value == 0, per group.

print(df.groupby('in_range')['condition'].mean())

Gives:

                    mask
In Range        0.500000
Out of Range    0.333333

Filter based on time. Then find the most common time where First is 0.

try:
    (df.set_index('Date').between_time('23:00:00','23:50:00').reset_index()
       .loc[lambda x: x.First == 0].Date.dt.time.value_counts().index[0])
except IndexError:
    print('No matches')

This will return a datetime.time , or in the case of your sample data, it will print that nothing matches as there are no 0s between the specified times.

You should convert your "Date" column data to datetime type first and you can apply the indexing method using dt as mentioned by @smj

import pandas as pd

df = pd.read_csv('./sample.csv')
df['Date'] = pd.to_datetime(df['Date'])
print df[(df['Date'].dt.hour == 23) & (df['Date'].dt.minute < 50)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM