简体   繁体   English

根据pandas DataFrame中的条件替换列中的值

[英]Replacing values in column based on conditions in pandas DataFrame

I have the following data in a pandas DataFrame:我在熊猫 DataFrame 中有以下数据

import pandas as pd

df = pd.read_csv('example_data_file.csv')
df.head()

ID  Year    status
223725  1991    No
223725  1992    No
223725  1993    No
223725  1994    No
223725  1995    No

I want to replace the values in the column status , which has the values Yes and No for an ID based on the following condition: If an ID has at least one Yes in the column status then all observations (including No ) in the column status specific to that ID is replaced with Yes .我想在列替换值status ,其值YesNo基于以下条件的ID:如果ID至少有一个Yes在列status ,那么所有意见(包括No列) status特定于该ID被替换为Yes Otherwise, it remains unchanged.否则,它保持不变。

For example in the DataFrame below, 844272365 has Yes in status in the last row, then all previous observations in status in those rows specific to 844272365 should be replaced with Yes .例如,在下面的 DataFrame 中, 844272365在最后一行的status中具有Yes ,那么特定于844272365那些行中status中的所有先前观察都应替换为Yes

ID          Year    status
844272365   1991    No
844272365   1992    No
844272365   1993    No
844272365   1994    No
844272365   1995    No
844272365   1996    No
844272365   1997    No
844272365   1998    No
844272365   1999    No
844272365   2000    No
844272365   2001    No
844272365   2002    No
844272365   2003    No
844272365   2004    No
844272365   2005    No
844272365   2006    No
844272365   2007    No
844272365   2008    No
844272365   2010    No
844272365   2011    No
844272365   2012    No
844272365   2013    Yes

How do I make these replacements for many IDs in a DataFrame in accordance with the above condition?如何根据上述条件对 DataFrame 中的多个 ID 进行这些替换?

You can use transform :您可以使用transform

df['new_status'] = (df
                    .groupby('ID')['status']
                    .transform(lambda x: 'Yes' if x.str.contains('Yes').any() else 'No'))

Check transform with maxmax检查transform

'Yes'>'No' # this is the reason why max work 
Out[433]: True
df['new_status'] = df.groupby('ID')['status'].transform('max')
df
Out[435]: 
           ID  Year status new_status
0   844272365  1991     No        Yes
1   844272365  1992     No        Yes
2   844272365  1993     No        Yes
3   844272365  1994     No        Yes
4   844272365  1995     No        Yes
5   844272365  1996     No        Yes
6   844272365  1997     No        Yes
7   844272365  1998     No        Yes
8   844272365  1999     No        Yes
9   844272365  2000     No        Yes
10  844272365  2001     No        Yes
11  844272365  2002     No        Yes
12  844272365  2003     No        Yes
13  844272365  2004     No        Yes
14  844272365  2005     No        Yes
15  844272365  2006     No        Yes
16  844272365  2007     No        Yes
17  844272365  2008     No        Yes
18  844272365  2010     No        Yes
19  844272365  2011     No        Yes
20  844272365  2012     No        Yes
21  844272365  2013    Yes        Yes

The following should work:以下应该工作:

s=set(df[df.status=='Yes']['ID'])
for i in range(len(df)):
    if df.ID.iloc[i] in s:
        df.status[i]='Yes'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM