简体   繁体   English

创建分组的新数据框后添加列

[英]adding a column after creating a new dataframe that is grouped

I have a large dataframe, (printed below)..it has Dates, Times, High, Low. 我有一个大数据框,(印在下面)..它有日期,时间,高,低。 The rows are populated for every 5mins.. 每5分钟填充一次行。

What I'm trying to do is find the max in the high column for everyday, and return the Date Time High. 我想做的是每天在高栏中找到最大值,然后返回高日期时间。 The sample below just shows a single day. 下面的示例仅显示了一天。 The first problem I had to figure out was to find what the 'High' was for everyday, since there are multiple identical 'Date' rows, but different 'Time' & 'High' rows., the solution I came to was to create another dataframe (more below)... 我必须弄清的第一个问题是找出每天的“高”行是什么,因为有多个相同的“日期”行,但有不同的“时间”和“高”行。因此,我想到的解决方案是创建另一个数据框(更多信息请参见下文)...

        Date   Time   Ticker     Open     High      Low    Close
0     6/3/19   7:05  USD/JPY  108.370  108.370  108.345  108.345
1     6/3/19   7:10  USD/JPY  108.345  108.345  108.325  108.325
2     6/3/19   7:15  USD/JPY  108.330  108.360  108.330  108.340
3     6/3/19   7:20  USD/JPY  108.335  108.335  108.295  108.305
4     6/3/19   7:25  USD/JPY  108.305  108.305  108.270  108.305
5     6/3/19   7:30  USD/JPY  108.300  108.300  108.250  108.260
6     6/3/19   7:35  USD/JPY  108.265  108.295  108.265  108.290
7     6/3/19   7:40  USD/JPY  108.275  108.290  108.250  108.290
8     6/3/19   7:45  USD/JPY  108.285  108.290  108.275  108.290
9     6/3/19   7:50  USD/JPY  108.295  108.350  108.295  108.350
10    6/3/19   7:55  USD/JPY  108.355  108.355  108.325  108.330
11    6/3/19   8:00  USD/JPY  108.335  108.360  108.325  108.350

I tried the groupby function to write to a new database. 我尝试了groupby函数写入新数据库。 First I tried to groupby Date with a max function written. 首先,我尝试使用最大函数编写对日期进行分组。 This gave me the max and showed me the date.... 这给了我最大的机会,并显示了日期。

       Date     High
0   6/10/19  108.670
1   6/11/19  108.800
2   6/12/19  108.545
3   6/13/19  108.535
4   6/14/19  108.500
5   6/17/19  108.690
6   6/18/19  108.675
7   6/19/19  108.495
8   6/20/19  107.760
9   6/21/19  107.735
10  6/24/19  107.530
11   6/3/19  108.445
12   6/4/19  108.355
13   6/5/19  108.340
14   6/6/19  108.330
15   6/7/19  108.500

But I want to also see the 'Time' row for when that max was on that date? 但是我还想看到那个日期的最大时间是“时间”行吗? How can I pass this in? 我该如何传递?

Example of desired output 所需输出示例

Date       Time     High
6/10/19    9:05     108.670
6/11/19    11:35    108.800

'import pandas as pd '将熊猫作为pd导入

df = pd.read_csv("~/Downloads/file.csv", encoding = "ISO-8859-1") df = pd.read_csv(“〜/ Downloads / file.csv”,编码=“ ISO-8859-1”)

High grouped by Date 按日期分组的高位

df2 = df.groupby('Date', as_index= False)['High'].max() ' df2 = df.groupby('Date',as_index = False)['High']。max()'

I've tried 我试过了

'df2 = df.groupby('Date','Time' as_index= False)['High'].max()' 'df2 = df.groupby('Date','Time'as_index = False)['High']。max()'

But receive this error...... 但是收到这个错误......

df2 = df.groupby('Date','Time' as_index= False)['High'].max()
                                      ^

SyntaxError: invalid syntax SyntaxError:语法无效

I would just like to have a dataframe where it shows Date, Time, High for when the max was in the high column for everyday. 我只想有一个数据框,其中会显示“日期”,“时间”,“高”,以表示每天的最大值位于高列中。

      Date     High   TIME????????????????????
0   6/10/19  108.670
1   6/11/19  108.800
2   6/12/19  108.545
3   6/13/19  108.535
4   6/14/19  108.500
5   6/17/19  108.690
6   6/18/19  108.675
7   6/19/19  108.495
8   6/20/19  107.760
9   6/21/19  107.735
10  6/24/19  107.530
11   6/3/19  108.445
12   6/4/19  108.355
13   6/5/19  108.340
14   6/6/19  108.330
15   6/7/19  108.500

I changed the Date column a little bit for the illustration of the groupby function to the following: 为了说明groupby函数,我将“ Date列做了一些更改:

      Date  Time   Ticker     Open     High      Low    Close
0   6/3/19  7:05  USD/JPY  108.370  108.370  108.345  108.345
1   6/3/19  7:10  USD/JPY  108.345  108.345  108.325  108.325
2   6/3/19  7:15  USD/JPY  108.330  108.360  108.330  108.340
3   6/4/19  7:20  USD/JPY  108.335  108.335  108.295  108.305
4   6/4/19  7:25  USD/JPY  108.305  108.305  108.270  108.305
5   6/4/19  7:30  USD/JPY  108.300  108.300  108.250  108.260
6   6/5/19  7:35  USD/JPY  108.265  108.295  108.265  108.290
7   6/5/19  7:40  USD/JPY  108.275  108.290  108.250  108.290
8   6/5/19  7:45  USD/JPY  108.285  108.290  108.275  108.290
9   6/6/19  7:50  USD/JPY  108.295  108.350  108.295  108.350
10  6/6/19  7:55  USD/JPY  108.355  108.355  108.325  108.330
11  6/6/19  8:00  USD/JPY  108.335  108.360  108.325  108.350

You could try: 您可以尝试:

df.loc[df.groupby('Date')['High'].idxmax()]

which will give you: 这将为您提供:

      Date  Time   Ticker     Open     High      Low    Close
0   6/3/19  7:05  USD/JPY  108.370  108.370  108.345  108.345
3   6/4/19  7:20  USD/JPY  108.335  108.335  108.295  108.305
6   6/5/19  7:35  USD/JPY  108.265  108.295  108.265  108.290
11  6/6/19  8:00  USD/JPY  108.335  108.360  108.325  108.350

Then drop any columns you don't want. 然后删除所有不需要的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM