简体   繁体   English

在 pandas python 循环的每次迭代中仅将所需数据放入数据框中

[英]Putting only the required data in data frame in each iteration of a loop in pandas python

My data has columns:我的数据有列:

           |  Area_code  | ProductID  | Stock
Date       ----------------------------------
2016-04-01 |    920      | 100000135    2.000
2016-05-01 |    920      | 100000135    4.125
2016-06-01 |    920      | 100000135    7.375
2016-07-01 |    920      | 100000135    7.000
2016-08-01 |    920      | 100000135    4.500
2016-09-01 |    920      | 100000135    2.000
2016-10-01 |    920      | 100000135    6.175
2016-11-01 |    920      | 100000135    4.750
2016-12-01 |    920      | 100000135    2.625
2017-01-01 |    920      | 100000135    1.625
2017-02-01 |    920      | 100000135    4.500
2017-03-01 |    920      | 100000135    4.625
2017-04-01 |    920      | 100000135    1.000  
2016-04-01 |    920      | 100000136    0.100
2016-06-01 |    920      | 100000136    0.075
2016-07-01 |    920      | 100000136    0.200
2016-09-01 |    920      | 100000136    0.100
2017-03-01 |    920      | 100000136    0.050
2017-05-01 |    920      | 100000136    0.100
2017-06-01 |    920      | 100000136    0.025
2018-05-01 |    920      | 100000136    0.125
2018-08-01 |    920      | 100000136    0.200
2018-12-01 |    920      | 100000136    0.050
2019-02-01 |    920      | 100000136    0.100
2019-03-01 |    920      | 100000136    0.050

The data is present in Pandas dataframe with index "Date" column.数据存在于 Pandas dataframe 中,索引为“日期”列。 The requirement is to iterate over this dataframe, and brings only those rows in another dataframe(inside a loop), that has same "Area_Code" and "Product_ID", to get the result as:要求是迭代此 dataframe,并仅将那些具有相同“Area_Code”和“Product_ID”的数据帧(在循环内)中的行带入,以获得如下结果:

(Say, in iteration 1 of loop, for (920, 100000135) pair), the dataframe in loop should return: (比如说,在循环的第 1 次迭代中,对于 (920, 100000135) 对),循环中的 dataframe 应该返回:

              Stock
Date          -----
2016-04-01 |  2.000
2016-05-01 |  4.125
. 
. 
.
2017-04-01 |  1.000

(Then, in iteration 2 of loop, for (920, 100000136) pair), the dataframe in loop should return: (然后,在循环的第 2 次迭代中,对于 (920, 100000136) 对),循环中的 dataframe 应该返回:

              Stock
Date          -----
2016-04-01 |  0.100
2016-06-01 |  0.075
. 
. 
.
2019-03-01 |  0.050

Also, If my dataframe generated above [ie as a result of (Area_code, ProductID) pair] has number of records less than 12, I want to skip that iteration and return me the values next iteration.此外,如果我上面生成的 dataframe [即作为 (Area_code, ProductID) 对的结果] 的记录数少于 12,我想跳过该迭代并在下一次迭代中返回值。

Please help on this requirement.请帮忙解决这个要求。 Kindly mention if anything is unclear in question.如果有任何不清楚的地方,请提及。 Many thanks.非常感谢。

I would suggest something like below我建议像下面这样

import pandas as pd

df = pd.DataFrame({'Date': ['10/02/2020', '27/01/2020', '27/04/2020', '26/03/2020', '21/02/2020', '07/06/2020',
                            '12/04/2020'],
                   'Area_code': [920, 920, 920, 920, 921, 921, 921],
                   'product_id': [13, 13, 13, 13, 16, 16, 16],
                   'stok': [1, 2, 3, 4, 6, 7, 8]})


def extract(ac, pi):
    #Filter the desired area code and product (e.g., 920, 100000136) pair)
    rslt_df = df[(df['Area_code'] == ac) & (df['product_id'] == pi)]

     # assign [] if records less than 12, you can delete the list later if it is equal to []
    return rslt_df[['Date', 'stok']] if rslt_df.shape[0] > 3 else None

Area_code = [920, 921]
product_id = [13, 16]

append_data=[extract(a, b) for (a, b) in zip(Area_code, product_id)]

#Remove None
all_report = [x for x in append_data if x is not None]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM