简体   繁体   中英

Trying to make a new column in pandas dataframe by filtering another column using a if statement

Trying to make a column named loan_status_is_great on my pandas dataframe. It should contain the integer 1 if loan_status is "Current" or "Fully Paid." Else it should contain the integer 0.

I'm using https://resources.lendingclub.com/LoanStats_2018Q4.csv.zip as my dataset.

My problem code is:

def loan_great():
   if (df['loan_status']).any == 'Current' or (df['loan_status']).any == 'Fully Paid':
     return 1
   else:
     return 0

df['loan_status_is_great']=df['loan_status'].apply(loan_great())

TypeError Traceback (most recent call last) in () ----> 1 df['loan_status_is_great']=df['loan_status'].apply(loan_great())

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 4043 else: 4044 values = self.astype(object).values -> 4045 mapped = lib.map_infer(values, f, convert=convert_dtype) 4046 4047 if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

TypeError: 'int' object is not callable

Let's try a different approach using isin to create a boolean series and convert to integer:

df['loan_status'].isin(['Current','Fully Paid']).astype(int)

I find that the numpy where function is a good choice for these simple column creations while maintaining good speed. Something like the below should work:

import numpy as np
df['loan_status_is_great'] = np.where(df['loan_status']=='Current'|
                                      df['loan_status']=='Fully Paid',
                                      1,
                                      0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM