[英]New column in DataFrame based on conditions
我有一個像這樣的DataFrame:
+------------+---------------+-------------+---------------------+-------------------+
| SK_ID_CURR | CREDIT_ACTIVE | DAYS_CREDIT | DAYS_CREDIT_ENDDATE | DAYS_ENDDATE_FACT |
+------------+---------------+-------------+---------------------+-------------------+
| 436084 | Sold | -2835 | -2094.0 | -2436.0 |
| 436084 | Active | -987 | -438.0 | NaN |
| 436084 | Sold | -1875 | -1494.0 | -1494.0 |
| 436084 | Active | -1135 | -951.0 | NaN |
| 436084 | Bad debt | -986 | NaN | NaN |
| 436084 | Active | -968 | -845.0 | NaN |
| 436084 | Active | -987 | -803.0 | NaN |
+------------+---------------+-------------+---------------------+-------------------+
我喜歡使用以下規則創建新列CREDIT_LENGTH_IN_DAYS:
def func(x):
if x[x['CREDIT_ACTIVE'] == 'Active']:
return x['DAYS_CREDIT_ENDDATE'] - x['DAYS_CREDIT']
elif x[x['CREDIT_ACTIVE'] == 'Closed'] | x[x['CREDIT_ACTIVE'] == 'Sold'] :
return x['DAYS_ENDDATE_FACT'] - x['DAYS_CREDIT']
elif x[x['CREDIT_ACTIVE'] == 'Bad debt']:
return x['DAYS_CREDIT']
然后我用:
df_bureau['CREDIT_LENGTH_IN_DAYS'] = df_bureau.apply(func, axis=1)
但是,當案例為x[x['CREDIT_ACTIVE']=='Bad debt'
我得到的是有趣的值,而不是x['DAYS_CREDIT']
每一行的實際值。
使用numpy.select
:
m1 = df_bureau['CREDIT_ACTIVE'] == 'Active'
m2 = df_bureau['CREDIT_ACTIVE'].isin(['Closed','Sold'])
m3 = df_bureau['CREDIT_ACTIVE'] == 'Bad debt'
v1 = df_bureau['DAYS_CREDIT_ENDDATE'] - df_bureau['DAYS_CREDIT']
v2 = df_bureau['DAYS_ENDDATE_FACT'] - df_bureau['DAYS_CREDIT']
v3 = df_bureau['DAYS_CREDIT']
df_bureau['CREDIT_LENGTH_IN_DAYS'] = np.select([m1, m2, m3], [v1, v2, v3], np.nan)
print (df_bureau)
SK_ID_CURR CREDIT_ACTIVE DAYS_CREDIT DAYS_CREDIT_ENDDATE \
0 436084 Sold -2835 -2094.0
1 436084 Active -987 -438.0
2 436084 Sold -1875 -1494.0
3 436084 Active -1135 -951.0
4 436084 Bad debt -986 NaN
5 436084 Active -968 -845.0
6 436084 Active -987 -803.0
DAYS_ENDDATE_FACT CREDIT_LENGTH_IN_DAYS
0 -2436.0 399.0
1 NaN 549.0
2 -1494.0 381.0
3 NaN 184.0
4 NaN -986.0
5 NaN 123.0
6 NaN 184.0
您的解決方案分別與每一行一起使用,因此不需要過濾,也需要更改|
or
由於使用標量:
def func(x):
if x['CREDIT_ACTIVE'] == 'Active':
return x['DAYS_CREDIT_ENDDATE'] - x['DAYS_CREDIT']
elif (x['CREDIT_ACTIVE'] == 'Closed') or (x['CREDIT_ACTIVE'] == 'Sold'):
return x['DAYS_ENDDATE_FACT'] - x['DAYS_CREDIT']
elif x['CREDIT_ACTIVE'] == 'Bad debt':
return x['DAYS_CREDIT']
df_bureau['CREDIT_LENGTH_IN_DAYS'] = df_bureau.apply(func, axis=1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.