Pandas DataFrame: Dropping rows after meeting conditions in columns

Question

I have a large panel data in a pandas DataFrame:

import pandas as pd 

df = pd.read_csv('Qs_example_data.csv')

df.head()

ID      Year    DOB  status YOD
223725  1991    1975.0  No  2021
223725  1992    1975.0  No  2021
223725  1993    1975.0  No  2021
223725  1994    1975.0  No  2021
223725  1995    1975.0  No  2021

I want to drop the rows based on the following condition: If the value in YOD matches the value in Year then all rows after that matching row for that ID are dropped, or if a Yes is observed in the column status for that ID .

For example in the DataFrame, ID 68084329 has the values 2012 in the DOB and YOD columns on row 221930. All rows after 221930 for 68084329 should be dropped.

df.loc[x['ID'] == 68084329]

          ID        Year     DOB  status YOD
221910  68084329    1991    1942.0  No  2012
221911  68084329    1992    1942.0  No  2012
221912  68084329    1993    1942.0  No  2012
221913  68084329    1994    1942.0  No  2012
221914  68084329    1995    1942.0  No  2012
221915  68084329    1996    1942.0  No  2012
221916  68084329    1997    1942.0  No  2012
221917  68084329    1998    1942.0  No  2012
221918  68084329    1999    1942.0  No  2012
221919  68084329    2000    1942.0  No  2012
221920  68084329    2001    1942.0  No  2012
221921  68084329    2002    1942.0  No  2012
221922  68084329    2003    1942.0  No  2012
221923  68084329    2004    1942.0  No  2012
221924  68084329    2005    1942.0  No  2012
221925  68084329    2006    1942.0  No  2012
221926  68084329    2007    1942.0  No  2012
221927  68084329    2008    1942.0  No  2012
221928  68084329    2010    1942.0  No  2012
221929  68084329    2011    1942.0  No  2012
221930  68084329    2012    1942.0  Yes 2012
221931  68084329    2013    1942.0  No  2012
221932  68084329    2014    1942.0  No  2012
221933  68084329    2015    1942.0  No  2012
221934  68084329    2016    1942.0  No  2012
221935  68084329    2017    1942.0  No  2012

I have a lot of IDs that have rows which need to be dropped in accordance with the above condition. How do I do this?

Answer 1

The following code should also work:

result=df[0:0]
ids=[]
for i in df.ID:
    if i not in ids:
        ids.append(i)

for k in ids:
  temp=df[df.ID==k]
  for j in range(len(temp)):
    result=pd.concat([result, temp.iloc[j:j+1, :]])
    if temp.iloc[j, :]['status']=='Yes':
      break

print(result)

Answer 2

This should do. From your wording, it wasn't clear whether you need to "drop all the rows after you encounter a Yes for that ID", or " just the rows you encounter a Yes in". I assumed that you need to "drop all the rows after you encounter a Yes for that ID".

import pandas as pd


def __get_nos__(df):
    return df.iloc[0:(df['Status'] != 'Yes').values.argmin(), :]


df = pd.DataFrame()
df['ID'] = [12345678]*10 + [13579]*10
df['Year'] = list(range(2000, 2010))*2
df['DOB'] = list(range(2000, 2010))*2
df['YOD'] = list(range(2000, 2010))*2
df['Status'] = ['No']*5 + ['Yes']*5 + ['No']*7 + ['Yes']*3
""" df
          ID  Year   DOB   YOD Status
0   12345678  2000  2000  2000     No
1   12345678  2001  2001  2001     No
2   12345678  2002  2002  2002     No
3   12345678  2003  2003  2003     No
4   12345678  2004  2004  2004     No
5   12345678  2005  2005  2005    Yes
6   12345678  2006  2006  2006    Yes
7   12345678  2007  2007  2007    Yes
8   12345678  2008  2008  2008    Yes
9   12345678  2009  2009  2009    Yes
10     13579  2000  2000  2000     No
11     13579  2001  2001  2001     No
12     13579  2002  2002  2002     No
13     13579  2003  2003  2003     No
14     13579  2004  2004  2004     No
15     13579  2005  2005  2005     No
16     13579  2006  2006  2006     No
17     13579  2007  2007  2007    Yes
18     13579  2008  2008  2008    Yes
19     13579  2009  2009  2009    Yes
"""
df.groupby('ID').apply(lambda x: __get_nos__(x)).reset_index(drop=True)
""" Output
          ID  Year   DOB   YOD Status
0      13579  2000  2000  2000     No
1      13579  2001  2001  2001     No
2      13579  2002  2002  2002     No
3      13579  2003  2003  2003     No
4      13579  2004  2004  2004     No
5      13579  2005  2005  2005     No
6      13579  2006  2006  2006     No
7   12345678  2000  2000  2000     No
8   12345678  2001  2001  2001     No
9   12345678  2002  2002  2002     No
10  12345678  2003  2003  2003     No
11  12345678  2004  2004  2004     No
"""

Pandas DataFrame: Dropping rows after meeting conditions in columns

Question

2 answers

solution1
1 2020-09-03 10:05:26

solution2
1 2020-09-03 10:16:28

Pandas DataFrame: Dropping rows after meeting conditions in columns

Question

2 answers

solution1 1 2020-09-03 10:05:26

solution2 1 2020-09-03 10:16:28

solution1
1 2020-09-03 10:05:26

solution2
1 2020-09-03 10:16:28