簡體   English   中英

DataFrame中基於條件的新列

[英]New column in DataFrame based on conditions

我有一個像這樣的DataFrame:

+------------+---------------+-------------+---------------------+-------------------+
| SK_ID_CURR | CREDIT_ACTIVE | DAYS_CREDIT | DAYS_CREDIT_ENDDATE | DAYS_ENDDATE_FACT |
+------------+---------------+-------------+---------------------+-------------------+
|     436084 | Sold          |       -2835 | -2094.0             | -2436.0           |
|     436084 | Active        |        -987 | -438.0              | NaN               |
|     436084 | Sold          |       -1875 | -1494.0             | -1494.0           |
|     436084 | Active        |       -1135 | -951.0              | NaN               |
|     436084 | Bad debt      |        -986 | NaN                 | NaN               |
|     436084 | Active        |        -968 | -845.0              | NaN               |
|     436084 | Active        |        -987 | -803.0              | NaN               |
+------------+---------------+-------------+---------------------+-------------------+

我喜歡使用以下規則創建新列CREDIT_LENGTH_IN_DAYS:

def func(x):
    if x[x['CREDIT_ACTIVE'] == 'Active']:
    return  x['DAYS_CREDIT_ENDDATE'] - x['DAYS_CREDIT']
    elif x[x['CREDIT_ACTIVE'] == 'Closed'] | x[x['CREDIT_ACTIVE'] == 'Sold'] :
    return x['DAYS_ENDDATE_FACT'] - x['DAYS_CREDIT']
    elif x[x['CREDIT_ACTIVE'] == 'Bad debt']:
    return x['DAYS_CREDIT']

然后我用:

df_bureau['CREDIT_LENGTH_IN_DAYS'] = df_bureau.apply(func, axis=1)

但是,當案例為x[x['CREDIT_ACTIVE']=='Bad debt'我得到的是有趣的值,而不是x['DAYS_CREDIT']每一行的實際值。

使用numpy.select

m1 = df_bureau['CREDIT_ACTIVE'] == 'Active'
m2 = df_bureau['CREDIT_ACTIVE'].isin(['Closed','Sold'])
m3 = df_bureau['CREDIT_ACTIVE'] == 'Bad debt'

v1 = df_bureau['DAYS_CREDIT_ENDDATE'] - df_bureau['DAYS_CREDIT']
v2 = df_bureau['DAYS_ENDDATE_FACT'] - df_bureau['DAYS_CREDIT']
v3 = df_bureau['DAYS_CREDIT']

df_bureau['CREDIT_LENGTH_IN_DAYS'] = np.select([m1, m2, m3], [v1, v2, v3], np.nan)
print (df_bureau)
   SK_ID_CURR CREDIT_ACTIVE  DAYS_CREDIT  DAYS_CREDIT_ENDDATE  \
0      436084          Sold        -2835              -2094.0   
1      436084        Active         -987               -438.0   
2      436084          Sold        -1875              -1494.0   
3      436084        Active        -1135               -951.0   
4      436084      Bad debt         -986                  NaN   
5      436084        Active         -968               -845.0   
6      436084        Active         -987               -803.0   

   DAYS_ENDDATE_FACT  CREDIT_LENGTH_IN_DAYS  
0            -2436.0                  399.0  
1                NaN                  549.0  
2            -1494.0                  381.0  
3                NaN                  184.0  
4                NaN                 -986.0  
5                NaN                  123.0  
6                NaN                  184.0  

您的解決方案分別與每一行一起使用,因此不需要過濾,也需要更改| or由於使用標量:

def func(x):
    if x['CREDIT_ACTIVE'] == 'Active':
        return  x['DAYS_CREDIT_ENDDATE'] - x['DAYS_CREDIT']
    elif (x['CREDIT_ACTIVE'] == 'Closed') or (x['CREDIT_ACTIVE'] == 'Sold'):
        return x['DAYS_ENDDATE_FACT'] - x['DAYS_CREDIT']
    elif x['CREDIT_ACTIVE'] == 'Bad debt':
        return x['DAYS_CREDIT']

df_bureau['CREDIT_LENGTH_IN_DAYS'] = df_bureau.apply(func, axis=1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM