简体   繁体   中英

Iterating through two columns in a pandas dataframe

I'm trying to iterate over two columns in a dataframe and create a dummy column for a statsmodel analysis if the client has always renewed their contract, by looking for contracts from this year ( data.Year_Season == 2014-2015 ) and that the client had renewed more than once ( data.Rank_ouput > 1 ). See the code below:

def make_always_renewed_column(data):
    for i, row in data.iterrows():  
        if row.Year_Season and row.Rank_output > 1:
            return 1
        else:
            return 0 


data['alwaysRenewed'] = make_always_renewed_column(data)

But when I look at what was returned with:

data.groupby(['alwaysRenewed'])[['lead_id']].count()

All rows in the new column returned 0.

I tried this on one row that met the conditions with .iloc and it returned True .

Any ideas?

Update

Just tried it like this to no avail:

def make_always_renewed_column(data):
for row in data.itertuples():
    if row[8] == '2014-2015' and row[10] > 1:
        return 1
    else:
        return 0 
    

There's no need to loop through individual rows to do these types of tests. Operations like + , - , == etc. on pandas columns are vectorised , ie they are automatically applied to each element of the column. Your test should just look like:

data['alwaysRenewed'] = (data['Year_Season'] == '2014-2015') & (data['Rank_output'] > 1)

This will create a boolean column, ie a column of True / False values. These will act like 0/1 for the purposes of sums, means etc., but you can convert to 0/1 explicitly using:

data['alwaysRenewed'] = data['alwaysRenewed'].astype(int)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM