List Comprehension & Speed Optimization

Question

I have a pandas dataframe, within the dataframe I have two series/columns that I wish to combine into a new series/column. I already have a for loop that does what I need but I'd rather it be in a list comprehension but I cannot figure it out. Also my code takes a considerable amount of time to execute. I read that list comprehensions run quicker, maybe there is a quicker way?

If the values from 'lead_owner' match the distinct/unique values from 'agent_final' use that value. Otherwise use the values from 'agent_final'

for x, y in zip(list(df['lead_owner']), list(df['agent_final'])):
    if x in set(df['agent_final']):
        my_list .append(x)
    else:
        my_list .append(y)

Answer 1

The way to do this using list comprehension:

my_list = [x if x in set(df['agent_final']) else y for (x,y) in zip(list(df['lead_owner']), list(df['agent_final']))]

It's pretty hard to say why your code is running slow, unless I know what the size of your data is.

One way to speed up your code for sure is to not construct the set every time you check if x is in the set. Construct the set outside of the for loop/ list comprehension:

agent_final_set = set(df['agent_final'])
my_list = [x if x in agent_final_set else y for (x,y) in zip(list(df['lead_owner']), list(df['agent_final']))]

Answer 2

I removed some unnecessary code and extracted the creation of the set outside of the main loop. Let's see if this runs faster:

agents = set(df['agent_final'])
data = zip(df['lead_owner'], df['agent_final'])
result = [x if x in agents else y for x, y in data]

Answer 3

With numpy.where one-liner:

my_list = np.where(df.lead_owner.isin(df.agent_final), df.lead_owner, df.agent_final)

Simple example:

In [284]: df
Out[284]: 
  lead_owner agent_final
0          a           1
1          b           2
2          c           a
3          e           c

In [285]: np.where(df.lead_owner.isin(df.agent_final), df.lead_owner, df.agent_final)
Out[285]: array(['a', '2', 'c', 'c'], dtype=object)

Answer 4

I would suggest your try pandas apply and share performance:

agents = set(df['agent_final'])
df['result'] = df.apply(lambda x: x['lead_owner'] if x['lead_owner'] in agents else x['agent_final'], axis=1)

and do a to_list if required

List Comprehension & Speed Optimization

Question

4 answers

solution1
2 ACCPTED 2019-09-30 12:43:45

solution2
1 2019-09-30 12:49:14

solution3
0 2019-09-30 12:51:42

solution4
0 2019-09-30 12:58:54

List Comprehension & Speed Optimization

Question

4 answers

solution1 2 ACCPTED 2019-09-30 12:43:45

solution2 1 2019-09-30 12:49:14

solution3 0 2019-09-30 12:51:42

solution4 0 2019-09-30 12:58:54

solution1
2 ACCPTED 2019-09-30 12:43:45

solution2
1 2019-09-30 12:49:14

solution3
0 2019-09-30 12:51:42

solution4
0 2019-09-30 12:58:54