简体   繁体   English

遍历熊猫df行并执行操作

[英]Iterating through the pandas df rows and do an operation

I have a pandas dataframe which looks as below 我有一个熊猫数据框,如下所示

    Date          SKU     Balance
0   1/1/2017        X1       8
1   1/1/2017        X2      45
2   1/1/2017        X1      47
3   1/1/2017        X2      16
4   2/1/2017        X1      14
5   2/1/2017        X2      67
6   2/1/2017        X2       9
8   2/1/2017        X1      66
9   2/1/2017        X1     158

My first goal is to generate multiple dataframe filtered by every single day 我的第一个目标是生成每天过滤的多个数据框

for which I coded 我为之编写的

df_1stjan = df.query("Date == \"1/1/2017\"")

And I got the below result 我得到了以下结果

    Date          SKU     Balance
0   1/1/2017        X1       8
1   1/1/2017        X2      45
2   1/1/2017        X1      47
3   1/1/2017        X2      16

My second goal is to groupby SKU's and I coded 我的第二个目标是对SKU进行分组,我编写了代码

df_1stjan_uSKU = df_1stjan.groupby(['SKU','Date'], \
                         as_index=False).agg({'Balance':'sum'})

And I got the below result 我得到了以下结果

Date          SKU     Balance
0   1/1/2017        X1      55
1   1/1/2017        X2      61

At the moment I could only code to generate df for only one date at a time a 目前,我只能编写代码,一次只生成一个日期

But I need to write a function or loop to automate it for all the days of 2017. 但是我需要编写一个函数或循环以在2017年的所有时间内实现其自动化。

Note the Date column has string dtype 请注意日期列具有字符串dtype

I think you are making this too complicated on yourself. 我认为您使自己变得太复杂了。 You have pretty much solved your own problem, but I would recommend doing your indexing after the initial groupby and agg . 您已经解决了您自己的问题,但是我建议在初始groupbyagg 之后进行索引。

Sample dataframe : 样本数据框

    Balance Date    SKU
0   8   1/1/2017    X1
1   45  1/1/2017    X2
2   47  1/1/2017    X1
3   16  1/1/2017    X2
4   22  1/2/2017    X3
5   24  1/2/2017    X3
6   25  1/3/2017    X4
7   3   1/3/2017    X4 

groupby with agg groupbyagg

df1 = df.groupby(['Date', 'SKU'], as_index=False).agg({'Balance':'sum'})

    Date    SKU Balance
0   1/1/2017    X1  55
1   1/1/2017    X2  61
2   1/2/2017    X3  46
3   1/3/2017    X4  28

to_datetime to convert Date column to_datetime转换Date

df1['Date'] = pd.to_datetime(df1.Date, format='%m/%d/%Y')

date_range with all days you would like to access date_range与您要访问的所有日期

dr = pd.date_range('20170101','20170103')

loc with loop to access slice for each day loc与循环访问切片的每一天

for d in dr:
    print(df1.loc[df1.Date.isin([d])])

        Date SKU  Balance
0 2017-01-01  X1       55
1 2017-01-01  X2       61

        Date SKU  Balance
2 2017-01-02  X3       46

        Date SKU  Balance
3 2017-01-03  X4       28

If you do first 如果你先做

df_group = df.groupby(['Date', 'C1', 'C2', 'C3', 'SKU']).sum()

Then you can create your dfs such as: 然后,您可以创建dfs,例如:

for date in set(df['Date']):
    df_date = df_group.loc[date].reset_index()
    # and do whatever with df_date, you can save them in a list for example
    # to access them later but maybe the df_group.loc[date].reset_index() is enough for what you need

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM