[英]group by same partial string of pandas dataframe column
I have several csv files and each one contains one stock price in one month and has millions of data. 我有几个csv文件,每个文件包含一个月内的一个股票价格,并具有数百万个数据。 The raw csv data data is like:
原始的CSV数据数据如下:
AA_Candy.csv AA_Candy.csv
Index CompanyName Time Price
1 AA Candy 030101090355 1.78
2 AA Candy 030101091533 1.79
.......
333498 AA Candy 031231145556 2.18
BB_Cookie.csv BB_Cookie.csv
1 BB Cookie 030101090225 3.20
2 BB Cookie 030101090845 3.14
.......
391373 BB Cookie 031231145958 3.88
I use python and pandas to process the data, after I load and combine some of the datafiles, now I have a dataframe like: 在加载并合并一些数据文件后,我使用python和pandas处理数据,现在我有了一个数据框,如下所示:
frame: 帧:
Index CompanyName Time Price
1 AA Candy 030101090355 1.78
2 AA Candy 030101091533 1.79
.......
333498 AA Candy 031231145556 2.18
333499 BB Cookie 030101090225 3.20
333500 BB Cookie 030101090845 3.14
.......
712871 BB Cookie 031231145958 3.88
The time 031231145958 represent 2013-12-31 14:59:58 时间031231145958代表2013-12-31 14:59:58
now I want to get the highest price and final price in every one hour of each company, and get an output file like: 现在我想获得每个公司每一个小时的最高价格和最终价格,并获得如下输出文件:
range_start AA Candy/Max AA Candy/Close BB Cookie/Max BB Cookie/Close
0301010900 1.79 1.77 3.20 3.10
........
0312311400 2.24 2.18 3.88 3.88
Therefore I want to groupby the CompanyName and first 8 character of Time to get the data of same company in one hour, then do the calculation to find the max price value and final price value of each company and output the outcome with same start hour in one row; 因此,我想对公司名称和时间的前8个字符进行分组,以在一小时内获得同一公司的数据,然后进行计算以找到每个公司的最大价格值和最终价格值,并在相同的开始时间输出结果。一排 let companyName/Max or Close be the new column name.
让companyName / Max或Close为新列名。
Because I am really new in pandas and dataframe, I have the following questions: 因为我真的是熊猫和数据框的新手,所以我有以下问题:
Thanks!! 谢谢!!
Perform a groupby
on the company name and first 8 characters of your string timestamp (ie date plus hour). 对公司名称和字符串时间戳的前8个字符(即日期加小时)进行
groupby
。 Then use agg
on the price to get custom functions for each (first, max, min and last). 然后在价格上使用
agg
获取每个(第一个,最大,最小和最后一个)的自定义函数。 Unstack the company names, swap the levels of the company names and open/high/low/close and optionally sort your symbols. 取消堆叠公司名称,交换公司名称的级别并打开/高/低/关闭,并选择对您的代码进行排序。
gb = (df.groupby(['CompanyName', df.Time.str[:8]])
.Price
.agg({'open': 'first',
'high': np.max,
'low': np.min,
'close': 'last'})[['open', 'high', 'low', 'close']]
.unstack('CompanyName'))
gb.columns = gb.columns.swaplevel(0, 1)
>>> gb.sortlevel(level=0, axis=1)
CompanyName AA Candy BB Cookie
open high low close open high low close
Time
03010109 1.78 1.79 1.78 1.79 3.20 3.20 3.14 3.14
03123114 2.18 2.18 2.18 2.18 3.88 3.88 3.88 3.88
For your first question, you can use 对于第一个问题,您可以使用
df.groupby(df.Time.str[0:8])
For your second question, unstack
should be what you want: 对于第二个问题,应根据需要进行
unstack
:
df.groupby(df.Time.str[0:8]).unstack()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.