[英]Replacing values in column based on conditions in pandas DataFrame
I have the following data in a pandas DataFrame:我在熊猫 DataFrame 中有以下数据:
import pandas as pd
df = pd.read_csv('example_data_file.csv')
df.head()
ID Year status
223725 1991 No
223725 1992 No
223725 1993 No
223725 1994 No
223725 1995 No
I want to replace the values in the column status
, which has the values Yes
and No
for an ID based on the following condition: If an ID
has at least one Yes
in the column status
then all observations (including No
) in the column status
specific to that ID
is replaced with Yes
.我想在列替换值
status
,其值Yes
和No
基于以下条件的ID:如果ID
至少有一个Yes
在列status
,那么所有意见(包括No
列) status
特定于该ID
被替换为Yes
。 Otherwise, it remains unchanged.否则,它保持不变。
For example in the DataFrame below, 844272365
has Yes
in status
in the last row, then all previous observations in status
in those rows specific to 844272365
should be replaced with Yes
.例如,在下面的 DataFrame 中,
844272365
在最后一行的status
中具有Yes
,那么特定于844272365
那些行中status
中的所有先前观察都应替换为Yes
。
ID Year status
844272365 1991 No
844272365 1992 No
844272365 1993 No
844272365 1994 No
844272365 1995 No
844272365 1996 No
844272365 1997 No
844272365 1998 No
844272365 1999 No
844272365 2000 No
844272365 2001 No
844272365 2002 No
844272365 2003 No
844272365 2004 No
844272365 2005 No
844272365 2006 No
844272365 2007 No
844272365 2008 No
844272365 2010 No
844272365 2011 No
844272365 2012 No
844272365 2013 Yes
How do I make these replacements for many IDs in a DataFrame in accordance with the above condition?如何根据上述条件对 DataFrame 中的多个 ID 进行这些替换?
Check transform
with max
用
max
检查transform
'Yes'>'No' # this is the reason why max work
Out[433]: True
df['new_status'] = df.groupby('ID')['status'].transform('max')
df
Out[435]:
ID Year status new_status
0 844272365 1991 No Yes
1 844272365 1992 No Yes
2 844272365 1993 No Yes
3 844272365 1994 No Yes
4 844272365 1995 No Yes
5 844272365 1996 No Yes
6 844272365 1997 No Yes
7 844272365 1998 No Yes
8 844272365 1999 No Yes
9 844272365 2000 No Yes
10 844272365 2001 No Yes
11 844272365 2002 No Yes
12 844272365 2003 No Yes
13 844272365 2004 No Yes
14 844272365 2005 No Yes
15 844272365 2006 No Yes
16 844272365 2007 No Yes
17 844272365 2008 No Yes
18 844272365 2010 No Yes
19 844272365 2011 No Yes
20 844272365 2012 No Yes
21 844272365 2013 Yes Yes
The following should work:以下应该工作:
s=set(df[df.status=='Yes']['ID'])
for i in range(len(df)):
if df.ID.iloc[i] in s:
df.status[i]='Yes'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.