简体   繁体   English

根据行和列条件保留 pandas dataframe 的行

[英]Keep rows of a pandas dataframe based on both row and column conditions

Hello I have a pandas dataframe that I want to clean.Here is an example:您好,我有一个 pandas dataframe 想要清洁。这是一个示例:

IDBILL身份证 IDBUYER IDBUYER BILL账单 DATE日期
001 001 768787 768787 45 45 1897-07-24 1897-07-24
001 001 768787 768787 67 67 1897-07-24 1897-07-24
001 001 768787 768787 98 98 1897-07-24 1897-07-24
002 002 768787 768787 30 30 1897-07-24 1897-07-24
002 002 768787 768787 15 15 1897-07-24 1897-07-24
002 002 768787 768787 12 12 1897-07-24 1897-07-24
005 005 786545 786545 45 45 1897-08-19 1897-08-19
008 008 657676 657676 89 89 1989-09-23 1989-09-23
009 009 657676 657676 42 42 1989-09-23 1989-09-23
010 010 657676 657676 18 18 1989-09-23 1989-09-23
012 012 657676 657676 51 51 1990-03-10 1990-03-10
016 016 892354 892354 73 73 1990-03-10 1990-03-10
018 018 892354 892354 48 48 1765-02-14 1765-02-14
020 020 892354 892354 62 62 1765-02-14 1765-02-14

I want to delete the highest bills(and keep the lowest when the bills are made on the same day, by the same IDBUYER, and whose bills IDs follow each other. To get this:我想删除最高的账单(并在同一天由同一个 IDBUYER 制作账单时保持最低,并且其账单 ID 彼此跟随。要得到这个:

IDBILL身份证 IDBUYER IDBUYER BILL账单 DATE日期
002 002 768787 768787 30 30 1897-07-24 1897-07-24
002 002 768787 768787 15 15 1897-07-24 1897-07-24
002 002 768787 768787 12 12 1897-07-24 1897-07-24
005 005 786545 786545 45 45 1897-08-19 1897-08-19
010 010 657676 657676 18 18 1989-09-23 1989-09-23
012 012 657676 657676 51 51 1990-03-10 1990-03-10
016 016 892354 892354 73 73 1990-03-10 1990-03-10
018 018 892354 892354 48 48 1765-02-14 1765-02-14
020 020 892354 892354 62 62 1765-02-14 1765-02-14

Thank you in advance先感谢您

One solution:一种解决方案:

df = df.sort_values('BILL')
df.loc[df.assign(cc = df.groupby(['DATE','IDBUYER',df.groupby(['DATE','IDBUYER'])['IDBILL'].transform(lambda x: x.diff().gt(1).cumsum())]).cumcount(),cc2 = df.groupby(['DATE','IDBUYER','IDBILL']).transform('count'),floor = lambda x: ~(x['cc'].floordiv(x['cc2'],axis=0).astype(bool)))['floor']].sort_index()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Pandas dataframe 中,如何根据满足不同条件的起始行和结束行过滤一组行? - In a Pandas dataframe, how to filter a set of rows based on a start row and end row both satisfying different conditions? 如何根据多个条件根据前一行填充 pandas dataframe 列的行? - How to populate rows of pandas dataframe column based with previous row based on a multiple conditions? 根据一列过滤熊猫数据框:保留所有行(如果值是该列) - Filter pandas dataframe based on a column: keep all rows if a value is that column 在 Pandas 数据框中在多个条件下(基于 2 列)删除行 - Drop rows on multiple conditions (based on 2 column) in pandas dataframe 如何根据pandas数据框中的多列值条件排除行? - How to exclude rows based on multi column value conditions in pandas dataframe? 根据行和列条件设置熊猫数据框值 - Setting pandas dataframe value based on row and column conditions Python:过滤pandas数据帧以保持基于列的指定行数 - Python: filter pandas dataframe to keep specified number of rows based on a column 根据列和行的条件填充 DataFrame - Filling a DataFrame based on conditions for both columns and rows 如何根据每行中的数据以及满足特定条件的其他行的存在向 Pandas Dataframe 添加新列? - How to add a new column to a Pandas Dataframe based on data both in each row, and on the existence of other rows that meet a specific criteria? Pandas 对 dataframe 进行采样,但根据列将多行视为单行 - Pandas sampling a dataframe but treating multiple rows as a single row based on column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM