[英]adding a column after creating a new dataframe that is grouped
I have a large dataframe, (printed below)..it has Dates, Times, High, Low. 我有一个大数据框,(印在下面)..它有日期,时间,高,低。 The rows are populated for every 5mins..
每5分钟填充一次行。
What I'm trying to do is find the max in the high column for everyday, and return the Date Time High. 我想做的是每天在高栏中找到最大值,然后返回高日期时间。 The sample below just shows a single day.
下面的示例仅显示了一天。 The first problem I had to figure out was to find what the 'High' was for everyday, since there are multiple identical 'Date' rows, but different 'Time' & 'High' rows., the solution I came to was to create another dataframe (more below)...
我必须弄清的第一个问题是找出每天的“高”行是什么,因为有多个相同的“日期”行,但有不同的“时间”和“高”行。因此,我想到的解决方案是创建另一个数据框(更多信息请参见下文)...
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
1 6/3/19 7:10 USD/JPY 108.345 108.345 108.325 108.325
2 6/3/19 7:15 USD/JPY 108.330 108.360 108.330 108.340
3 6/3/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
4 6/3/19 7:25 USD/JPY 108.305 108.305 108.270 108.305
5 6/3/19 7:30 USD/JPY 108.300 108.300 108.250 108.260
6 6/3/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
7 6/3/19 7:40 USD/JPY 108.275 108.290 108.250 108.290
8 6/3/19 7:45 USD/JPY 108.285 108.290 108.275 108.290
9 6/3/19 7:50 USD/JPY 108.295 108.350 108.295 108.350
10 6/3/19 7:55 USD/JPY 108.355 108.355 108.325 108.330
11 6/3/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
I tried the groupby function to write to a new database. 我尝试了groupby函数写入新数据库。 First I tried to groupby Date with a max function written.
首先,我尝试使用最大函数编写对日期进行分组。 This gave me the max and showed me the date....
这给了我最大的机会,并显示了日期。
Date High
0 6/10/19 108.670
1 6/11/19 108.800
2 6/12/19 108.545
3 6/13/19 108.535
4 6/14/19 108.500
5 6/17/19 108.690
6 6/18/19 108.675
7 6/19/19 108.495
8 6/20/19 107.760
9 6/21/19 107.735
10 6/24/19 107.530
11 6/3/19 108.445
12 6/4/19 108.355
13 6/5/19 108.340
14 6/6/19 108.330
15 6/7/19 108.500
But I want to also see the 'Time' row for when that max was on that date? 但是我还想看到那个日期的最大时间是“时间”行吗? How can I pass this in?
我该如何传递?
Example of desired output 所需输出示例
Date Time High
6/10/19 9:05 108.670
6/11/19 11:35 108.800
'import pandas as pd '将熊猫作为pd导入
df = pd.read_csv("~/Downloads/file.csv", encoding = "ISO-8859-1") df = pd.read_csv(“〜/ Downloads / file.csv”,编码=“ ISO-8859-1”)
df2 = df.groupby('Date', as_index= False)['High'].max() ' df2 = df.groupby('Date',as_index = False)['High']。max()'
'df2 = df.groupby('Date','Time' as_index= False)['High'].max()' 'df2 = df.groupby('Date','Time'as_index = False)['High']。max()'
But receive this error...... 但是收到这个错误......
df2 = df.groupby('Date','Time' as_index= False)['High'].max()
^
SyntaxError: invalid syntax SyntaxError:语法无效
I would just like to have a dataframe where it shows Date, Time, High for when the max was in the high column for everyday. 我只想有一个数据框,其中会显示“日期”,“时间”,“高”,以表示每天的最大值位于高列中。
Date High TIME????????????????????
0 6/10/19 108.670
1 6/11/19 108.800
2 6/12/19 108.545
3 6/13/19 108.535
4 6/14/19 108.500
5 6/17/19 108.690
6 6/18/19 108.675
7 6/19/19 108.495
8 6/20/19 107.760
9 6/21/19 107.735
10 6/24/19 107.530
11 6/3/19 108.445
12 6/4/19 108.355
13 6/5/19 108.340
14 6/6/19 108.330
15 6/7/19 108.500
I changed the Date
column a little bit for the illustration of the groupby
function to the following: 为了说明
groupby
函数,我将“ Date
列做了一些更改:
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
1 6/3/19 7:10 USD/JPY 108.345 108.345 108.325 108.325
2 6/3/19 7:15 USD/JPY 108.330 108.360 108.330 108.340
3 6/4/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
4 6/4/19 7:25 USD/JPY 108.305 108.305 108.270 108.305
5 6/4/19 7:30 USD/JPY 108.300 108.300 108.250 108.260
6 6/5/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
7 6/5/19 7:40 USD/JPY 108.275 108.290 108.250 108.290
8 6/5/19 7:45 USD/JPY 108.285 108.290 108.275 108.290
9 6/6/19 7:50 USD/JPY 108.295 108.350 108.295 108.350
10 6/6/19 7:55 USD/JPY 108.355 108.355 108.325 108.330
11 6/6/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
You could try: 您可以尝试:
df.loc[df.groupby('Date')['High'].idxmax()]
which will give you: 这将为您提供:
Date Time Ticker Open High Low Close
0 6/3/19 7:05 USD/JPY 108.370 108.370 108.345 108.345
3 6/4/19 7:20 USD/JPY 108.335 108.335 108.295 108.305
6 6/5/19 7:35 USD/JPY 108.265 108.295 108.265 108.290
11 6/6/19 8:00 USD/JPY 108.335 108.360 108.325 108.350
Then drop any columns you don't want. 然后删除所有不需要的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.