[英]Iterating through the pandas df rows and do an operation
I have a pandas dataframe which looks as below 我有一个熊猫数据框,如下所示
Date SKU Balance
0 1/1/2017 X1 8
1 1/1/2017 X2 45
2 1/1/2017 X1 47
3 1/1/2017 X2 16
4 2/1/2017 X1 14
5 2/1/2017 X2 67
6 2/1/2017 X2 9
8 2/1/2017 X1 66
9 2/1/2017 X1 158
My first goal is to generate multiple dataframe filtered by every single day 我的第一个目标是生成每天过滤的多个数据框
for which I coded 我为之编写的
df_1stjan = df.query("Date == \"1/1/2017\"")
And I got the below result 我得到了以下结果
Date SKU Balance
0 1/1/2017 X1 8
1 1/1/2017 X2 45
2 1/1/2017 X1 47
3 1/1/2017 X2 16
My second goal is to groupby SKU's and I coded 我的第二个目标是对SKU进行分组,我编写了代码
df_1stjan_uSKU = df_1stjan.groupby(['SKU','Date'], \
as_index=False).agg({'Balance':'sum'})
And I got the below result 我得到了以下结果
Date SKU Balance
0 1/1/2017 X1 55
1 1/1/2017 X2 61
At the moment I could only code to generate df for only one date at a time a 目前,我只能编写代码,一次只生成一个日期
But I need to write a function or loop to automate it for all the days of 2017. 但是我需要编写一个函数或循环以在2017年的所有时间内实现其自动化。
Note the Date column has string dtype 请注意日期列具有字符串dtype
I think you are making this too complicated on yourself. 我认为您使自己变得太复杂了。 You have pretty much solved your own problem, but I would recommend doing your indexing after the initial
groupby
and agg
. 您已经解决了您自己的问题,但是我建议在初始
groupby
和agg
之后进行索引。
Sample dataframe : 样本数据框 :
Balance Date SKU
0 8 1/1/2017 X1
1 45 1/1/2017 X2
2 47 1/1/2017 X1
3 16 1/1/2017 X2
4 22 1/2/2017 X3
5 24 1/2/2017 X3
6 25 1/3/2017 X4
7 3 1/3/2017 X4
groupby
with agg
groupby
与agg
df1 = df.groupby(['Date', 'SKU'], as_index=False).agg({'Balance':'sum'})
Date SKU Balance
0 1/1/2017 X1 55
1 1/1/2017 X2 61
2 1/2/2017 X3 46
3 1/3/2017 X4 28
to_datetime
to convert Date
column to_datetime
转换Date
列
df1['Date'] = pd.to_datetime(df1.Date, format='%m/%d/%Y')
date_range
with all days you would like to access date_range
与您要访问的所有日期
dr = pd.date_range('20170101','20170103')
loc
with loop to access slice for each day loc
与循环访问切片的每一天
for d in dr:
print(df1.loc[df1.Date.isin([d])])
Date SKU Balance
0 2017-01-01 X1 55
1 2017-01-01 X2 61
Date SKU Balance
2 2017-01-02 X3 46
Date SKU Balance
3 2017-01-03 X4 28
If you do first 如果你先做
df_group = df.groupby(['Date', 'C1', 'C2', 'C3', 'SKU']).sum()
Then you can create your dfs such as: 然后,您可以创建dfs,例如:
for date in set(df['Date']):
df_date = df_group.loc[date].reset_index()
# and do whatever with df_date, you can save them in a list for example
# to access them later but maybe the df_group.loc[date].reset_index() is enough for what you need
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.