简体   繁体   中英

Iterating through pandas df with multiple row conditions

I'm trying to work with a big dataframe (1M rows) where I need to set up high (1) and low (0) profiles I build this function but it's quite long to go through all the columns and rows, how could I improve it? I've heard about vectorisation put don't know how to set it up.

Many thanks

#x is a dataframe
def flag_low(x):
    if x['EAN'] in list1:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif ((x['local_hour'] <= 6) | (23 <= x['local_hour'])):
            return 1
        elif ((x['local_hour'] == 7) & ( x['local_minute'] < 30 )):
            return 1
        elif ((x['local_hour'] == 22) & ( 30 <= x['local_minute'] )):
            return 1
    elif x['EAN'] in list2:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif ((x['local_hour'] <= 6) | (23 <= x['local_hour'])):
            return 1
    elif x['EAN'] in list3:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
            return 1
    elif x['EAN'] in  list4:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
            return 1
    elif x['EAN'] in list5:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
            return 1
    elif x['EAN'] in list6:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif (x['local_time'] in be_holidays):
            return 1
        elif ((x['local_hour'] <= 5) | (21 <= x['local_hour'])):
            return 1
    elif x['EAN'] in list7:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif (x['local_time'] in be_holidays):
            return 1
        elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
            return 1
    elif x['EAN'] in list8:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif (x['local_time'] in be_holidays):
            return 1
        elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
            return 1
    elif x['EAN'] in list9:
        if (x['local_weekday'] >= 5 ):
            return 1
        elif (x['local_time'] in be_holidays):
            return 1
        elif ((x['local_hour'] <= 6) | (22 <= x['local_hour'])):
            return 1
    else:
        return 0
dataframe['BinLow'] = dataframe.apply(flag_low, axis = 1)

Next steps

I did what @Ade_1 explained but can't verify if it works due to a problem with:

TypeError: Cannot convert type '<class 'pandas.core.series.Series'>' to date.

on the line

(x['local_time'] in be_holidays)

How could I solve it?

Since you have a lot of if/else conditions, to vectorise you need the np.select().

For your nested ifs, you would have to chain them together

The syntax is as follows

condition= [
  (df['column'].isin(list1)) & (df['column']>= 5)
  #continue the conditions
 ]
choices= [
  1,
   # continue
     ]

 dataframe['BinLow']= np.select(condition, choices, default=0)

Note : the other of the condition should match your choices. Also, the default in the np.select() represents your last else statement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM