I have a large dataframe, (printed below)..it has Dates, Times, High, Low. The rows are populated for every 5mins..
What I'm trying to do is find the max in the high column for everyday, and return the Date Time High. The sample below just shows a single day. The first problem I had to figure out was to find what the 'High' was for everyday, since there are multiple identical 'Date' rows, but different 'Time' & 'High' rows., the solution I came to was to create another dataframe (more below)...
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
1 6/3/19 7:10 USD/JPY 108.345 108.345 108.325 108.325
2 6/3/19 7:15 USD/JPY 108.330 108.360 108.330 108.340
3 6/3/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
4 6/3/19 7:25 USD/JPY 108.305 108.305 108.270 108.305
5 6/3/19 7:30 USD/JPY 108.300 108.300 108.250 108.260
6 6/3/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
7 6/3/19 7:40 USD/JPY 108.275 108.290 108.250 108.290
8 6/3/19 7:45 USD/JPY 108.285 108.290 108.275 108.290
9 6/3/19 7:50 USD/JPY 108.295 108.350 108.295 108.350
10 6/3/19 7:55 USD/JPY 108.355 108.355 108.325 108.330
11 6/3/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
I tried the groupby function to write to a new database. First I tried to groupby Date with a max function written. This gave me the max and showed me the date....
Date High
0 6/10/19 108.670
1 6/11/19 108.800
2 6/12/19 108.545
3 6/13/19 108.535
4 6/14/19 108.500
5 6/17/19 108.690
6 6/18/19 108.675
7 6/19/19 108.495
8 6/20/19 107.760
9 6/21/19 107.735
10 6/24/19 107.530
11 6/3/19 108.445
12 6/4/19 108.355
13 6/5/19 108.340
14 6/6/19 108.330
15 6/7/19 108.500
But I want to also see the 'Time' row for when that max was on that date? How can I pass this in?
Example of desired output
Date Time High
6/10/19 9:05 108.670
6/11/19 11:35 108.800
'import pandas as pd
df = pd.read_csv("~/Downloads/file.csv", encoding = "ISO-8859-1")
df2 = df.groupby('Date', as_index= False)['High'].max() '
'df2 = df.groupby('Date','Time' as_index= False)['High'].max()'
But receive this error......
df2 = df.groupby('Date','Time' as_index= False)['High'].max()
^
SyntaxError: invalid syntax
I would just like to have a dataframe where it shows Date, Time, High for when the max was in the high column for everyday.
Date High TIME????????????????????
0 6/10/19 108.670
1 6/11/19 108.800
2 6/12/19 108.545
3 6/13/19 108.535
4 6/14/19 108.500
5 6/17/19 108.690
6 6/18/19 108.675
7 6/19/19 108.495
8 6/20/19 107.760
9 6/21/19 107.735
10 6/24/19 107.530
11 6/3/19 108.445
12 6/4/19 108.355
13 6/5/19 108.340
14 6/6/19 108.330
15 6/7/19 108.500
I changed the Date
column a little bit for the illustration of the groupby
function to the following:
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
1 6/3/19 7:10 USD/JPY 108.345 108.345 108.325 108.325
2 6/3/19 7:15 USD/JPY 108.330 108.360 108.330 108.340
3 6/4/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
4 6/4/19 7:25 USD/JPY 108.305 108.305 108.270 108.305
5 6/4/19 7:30 USD/JPY 108.300 108.300 108.250 108.260
6 6/5/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
7 6/5/19 7:40 USD/JPY 108.275 108.290 108.250 108.290
8 6/5/19 7:45 USD/JPY 108.285 108.290 108.275 108.290
9 6/6/19 7:50 USD/JPY 108.295 108.350 108.295 108.350
10 6/6/19 7:55 USD/JPY 108.355 108.355 108.325 108.330
11 6/6/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
You could try:
df.loc[df.groupby('Date')['High'].idxmax()]
which will give you:
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
3 6/4/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
6 6/5/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
11 6/6/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
Then drop any columns you don't want.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.