简体   繁体   中英

Replacing values in column based on conditions in pandas DataFrame

I have the following data in a pandas DataFrame:

import pandas as pd

df = pd.read_csv('example_data_file.csv')
df.head()

ID  Year    status
223725  1991    No
223725  1992    No
223725  1993    No
223725  1994    No
223725  1995    No

I want to replace the values in the column status , which has the values Yes and No for an ID based on the following condition: If an ID has at least one Yes in the column status then all observations (including No ) in the column status specific to that ID is replaced with Yes . Otherwise, it remains unchanged.

For example in the DataFrame below, 844272365 has Yes in status in the last row, then all previous observations in status in those rows specific to 844272365 should be replaced with Yes .

ID          Year    status
844272365   1991    No
844272365   1992    No
844272365   1993    No
844272365   1994    No
844272365   1995    No
844272365   1996    No
844272365   1997    No
844272365   1998    No
844272365   1999    No
844272365   2000    No
844272365   2001    No
844272365   2002    No
844272365   2003    No
844272365   2004    No
844272365   2005    No
844272365   2006    No
844272365   2007    No
844272365   2008    No
844272365   2010    No
844272365   2011    No
844272365   2012    No
844272365   2013    Yes

How do I make these replacements for many IDs in a DataFrame in accordance with the above condition?

You can use transform :

df['new_status'] = (df
                    .groupby('ID')['status']
                    .transform(lambda x: 'Yes' if x.str.contains('Yes').any() else 'No'))

Check transform with max

'Yes'>'No' # this is the reason why max work 
Out[433]: True
df['new_status'] = df.groupby('ID')['status'].transform('max')
df
Out[435]: 
           ID  Year status new_status
0   844272365  1991     No        Yes
1   844272365  1992     No        Yes
2   844272365  1993     No        Yes
3   844272365  1994     No        Yes
4   844272365  1995     No        Yes
5   844272365  1996     No        Yes
6   844272365  1997     No        Yes
7   844272365  1998     No        Yes
8   844272365  1999     No        Yes
9   844272365  2000     No        Yes
10  844272365  2001     No        Yes
11  844272365  2002     No        Yes
12  844272365  2003     No        Yes
13  844272365  2004     No        Yes
14  844272365  2005     No        Yes
15  844272365  2006     No        Yes
16  844272365  2007     No        Yes
17  844272365  2008     No        Yes
18  844272365  2010     No        Yes
19  844272365  2011     No        Yes
20  844272365  2012     No        Yes
21  844272365  2013    Yes        Yes

The following should work:

s=set(df[df.status=='Yes']['ID'])
for i in range(len(df)):
    if df.ID.iloc[i] in s:
        df.status[i]='Yes'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM