I have a DataFrame from cvs.
I would like to know at what time there is greater probability to have the row of column "First" to 0 between 23:00 to 23:50.
Date First Second
0 2019-01-09 22:59:00 0 20
1 2019-01-09 23:04:00 14 32
2 2019-01-09 23:10:00 9 27
3 2019-01-09 23:11:00 7 27
4 2019-01-09 23:12:00 7 26
5 2019-01-09 23:13:00 7 26
6 2019-01-09 23:14:00 7 25
7 2019-01-09 23:15:00 6 25
8 2019-01-09 23:16:00 5 23
9 2019-01-09 23:17:00 4 22
10 2019-01-09 23:18:00 3 22
... ... ... ...
134761 2019-05-05 21:20:00 18 36
134762 2019-05-05 21:21:00 16 35
134763 2019-05-05 21:22:00 15 34
134764 2019-05-05 21:23:00 14 33
I use this code for select the time expected :
heure = df.set_index('Date').between_time('23:00:00','23:50:00')
But I wasn't able to extract just time.
If you have any suggestions :)
Thanks,
Robin
How about using dt accessors? Updated with an end to end example for your use case.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'date': [
pd.to_datetime('2019-01-09 22:59:00'),
pd.to_datetime('2019-01-09 23:00:00'),
pd.to_datetime('2019-01-09 23:49:59'),
pd.to_datetime('2019-01-09 23:50:00'),
pd.to_datetime('2019-01-09 23:51:00'),
],
'value': [0, 0, 5, 6, 1]
}
)
# A mask to split the datset into two groups, based on the time.
df['in_range'] = np.where((df['date'].dt.hour == 23) & (df['date'].dt.minute < 50), 'In Range', 'Out of Range')
# A column that tests the condition you mentioned
df['condition'] = df['value'] == 0
# Group and get the average, which is the likelihood that value == 0, per group.
print(df.groupby('in_range')['condition'].mean())
Gives:
mask
In Range 0.500000
Out of Range 0.333333
Filter based on time. Then find the most common time where First is 0.
try:
(df.set_index('Date').between_time('23:00:00','23:50:00').reset_index()
.loc[lambda x: x.First == 0].Date.dt.time.value_counts().index[0])
except IndexError:
print('No matches')
This will return a datetime.time
, or in the case of your sample data, it will print that nothing matches as there are no 0s between the specified times.
You should convert your "Date" column data to datetime type first and you can apply the indexing method using dt
as mentioned by @smj
import pandas as pd
df = pd.read_csv('./sample.csv')
df['Date'] = pd.to_datetime(df['Date'])
print df[(df['Date'].dt.hour == 23) & (df['Date'].dt.minute < 50)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.