简体   繁体   中英

List Comprehension & Speed Optimization

I have a pandas dataframe, within the dataframe I have two series/columns that I wish to combine into a new series/column. I already have a for loop that does what I need but I'd rather it be in a list comprehension but I cannot figure it out. Also my code takes a considerable amount of time to execute. I read that list comprehensions run quicker, maybe there is a quicker way?

If the values from 'lead_owner' match the distinct/unique values from 'agent_final' use that value. Otherwise use the values from 'agent_final'

for x, y in zip(list(df['lead_owner']), list(df['agent_final'])):
    if x in set(df['agent_final']):
        my_list .append(x)
    else:
        my_list .append(y)

The way to do this using list comprehension:

my_list = [x if x in set(df['agent_final']) else y for (x,y) in zip(list(df['lead_owner']), list(df['agent_final']))]

It's pretty hard to say why your code is running slow, unless I know what the size of your data is.

One way to speed up your code for sure is to not construct the set every time you check if x is in the set. Construct the set outside of the for loop/ list comprehension:

agent_final_set = set(df['agent_final'])
my_list = [x if x in agent_final_set else y for (x,y) in zip(list(df['lead_owner']), list(df['agent_final']))]

I removed some unnecessary code and extracted the creation of the set outside of the main loop. Let's see if this runs faster:

agents = set(df['agent_final'])
data = zip(df['lead_owner'], df['agent_final'])
result = [x if x in agents else y for x, y in data]

With numpy.where one-liner:

my_list = np.where(df.lead_owner.isin(df.agent_final), df.lead_owner, df.agent_final)

Simple example:

In [284]: df
Out[284]: 
  lead_owner agent_final
0          a           1
1          b           2
2          c           a
3          e           c

In [285]: np.where(df.lead_owner.isin(df.agent_final), df.lead_owner, df.agent_final)
Out[285]: array(['a', '2', 'c', 'c'], dtype=object)

I would suggest your try pandas apply and share performance:

agents = set(df['agent_final'])
df['result'] = df.apply(lambda x: x['lead_owner'] if x['lead_owner'] in agents else x['agent_final'], axis=1)

and do a to_list if required

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM