[英]Keep rows of a pandas dataframe based on both row and column conditions
Hello I have a pandas dataframe that I want to clean.Here is an example:您好,我有一个 pandas dataframe 想要清洁。这是一个示例:
IDBILL![]() |
IDBUYER ![]() |
BILL![]() |
DATE![]() |
---|---|---|---|
001 ![]() |
768787 ![]() |
45 ![]() |
1897-07-24 ![]() |
001 ![]() |
768787 ![]() |
67 ![]() |
1897-07-24 ![]() |
001 ![]() |
768787 ![]() |
98 ![]() |
1897-07-24 ![]() |
002 ![]() |
768787 ![]() |
30 ![]() |
1897-07-24 ![]() |
002 ![]() |
768787 ![]() |
15 ![]() |
1897-07-24 ![]() |
002 ![]() |
768787 ![]() |
12 ![]() |
1897-07-24 ![]() |
005 ![]() |
786545 ![]() |
45 ![]() |
1897-08-19 ![]() |
008 ![]() |
657676 ![]() |
89 ![]() |
1989-09-23 ![]() |
009 ![]() |
657676 ![]() |
42 ![]() |
1989-09-23 ![]() |
010 ![]() |
657676 ![]() |
18 ![]() |
1989-09-23 ![]() |
012 ![]() |
657676 ![]() |
51 ![]() |
1990-03-10 ![]() |
016 ![]() |
892354 ![]() |
73 ![]() |
1990-03-10 ![]() |
018 ![]() |
892354 ![]() |
48 ![]() |
1765-02-14 ![]() |
020 ![]() |
892354 ![]() |
62 ![]() |
1765-02-14 ![]() |
I want to delete the highest bills(and keep the lowest when the bills are made on the same day, by the same IDBUYER, and whose bills IDs follow each other. To get this:我想删除最高的账单(并在同一天由同一个 IDBUYER 制作账单时保持最低,并且其账单 ID 彼此跟随。要得到这个:
IDBILL![]() |
IDBUYER ![]() |
BILL![]() |
DATE![]() |
---|---|---|---|
002 ![]() |
768787 ![]() |
30 ![]() |
1897-07-24 ![]() |
002 ![]() |
768787 ![]() |
15 ![]() |
1897-07-24 ![]() |
002 ![]() |
768787 ![]() |
12 ![]() |
1897-07-24 ![]() |
005 ![]() |
786545 ![]() |
45 ![]() |
1897-08-19 ![]() |
010 ![]() |
657676 ![]() |
18 ![]() |
1989-09-23 ![]() |
012 ![]() |
657676 ![]() |
51 ![]() |
1990-03-10 ![]() |
016 ![]() |
892354 ![]() |
73 ![]() |
1990-03-10 ![]() |
018 ![]() |
892354 ![]() |
48 ![]() |
1765-02-14 ![]() |
020 ![]() |
892354 ![]() |
62 ![]() |
1765-02-14 ![]() |
Thank you in advance先感谢您
One solution:一种解决方案:
df = df.sort_values('BILL')
df.loc[df.assign(cc = df.groupby(['DATE','IDBUYER',df.groupby(['DATE','IDBUYER'])['IDBILL'].transform(lambda x: x.diff().gt(1).cumsum())]).cumcount(),cc2 = df.groupby(['DATE','IDBUYER','IDBILL']).transform('count'),floor = lambda x: ~(x['cc'].floordiv(x['cc2'],axis=0).astype(bool)))['floor']].sort_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.