![](/img/trans.png)
[英]In a Pandas dataframe, how to filter a set of rows based on a start row and end row both satisfying different conditions?
[英]Keep rows of a pandas dataframe based on both row and column conditions
您好,我有一個 pandas dataframe 想要清潔。這是一個示例:
身份證 | IDBUYER | 賬單 | 日期 |
---|---|---|---|
001 | 768787 | 45 | 1897-07-24 |
001 | 768787 | 67 | 1897-07-24 |
001 | 768787 | 98 | 1897-07-24 |
002 | 768787 | 30 | 1897-07-24 |
002 | 768787 | 15 | 1897-07-24 |
002 | 768787 | 12 | 1897-07-24 |
005 | 786545 | 45 | 1897-08-19 |
008 | 657676 | 89 | 1989-09-23 |
009 | 657676 | 42 | 1989-09-23 |
010 | 657676 | 18 | 1989-09-23 |
012 | 657676 | 51 | 1990-03-10 |
016 | 892354 | 73 | 1990-03-10 |
018 | 892354 | 48 | 1765-02-14 |
020 | 892354 | 62 | 1765-02-14 |
我想刪除最高的賬單(並在同一天由同一個 IDBUYER 制作賬單時保持最低,並且其賬單 ID 彼此跟隨。要得到這個:
身份證 | IDBUYER | 賬單 | 日期 |
---|---|---|---|
002 | 768787 | 30 | 1897-07-24 |
002 | 768787 | 15 | 1897-07-24 |
002 | 768787 | 12 | 1897-07-24 |
005 | 786545 | 45 | 1897-08-19 |
010 | 657676 | 18 | 1989-09-23 |
012 | 657676 | 51 | 1990-03-10 |
016 | 892354 | 73 | 1990-03-10 |
018 | 892354 | 48 | 1765-02-14 |
020 | 892354 | 62 | 1765-02-14 |
先感謝您
一種解決方案:
df = df.sort_values('BILL')
df.loc[df.assign(cc = df.groupby(['DATE','IDBUYER',df.groupby(['DATE','IDBUYER'])['IDBILL'].transform(lambda x: x.diff().gt(1).cumsum())]).cumcount(),cc2 = df.groupby(['DATE','IDBUYER','IDBILL']).transform('count'),floor = lambda x: ~(x['cc'].floordiv(x['cc2'],axis=0).astype(bool)))['floor']].sort_index()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.