简体   繁体   中英

Python: how to get rid of nested loops?

I have 2 for loops, one after the other and I want to get rid of them somehow to improve code speed. My dataframe from pandas looks like this (headers represent different companies and the rows represent different users and 1 means that the user accessed that company, 0 otherwise):

   100  200  300  400
0    1    1    0    1
1    1    1    1    0

I want to compare each pair of companies in my dataset and for that, I created a list that contains all the ids of the companies. The code looks at the list takes the first company (base), then it pairs with every other company (peer), hence the second "for" loop. My code is the following:

def calculate_scores():
    df_matrix = create_the_matrix(df)
    print(df_matrix)
    for base in list_of_companies:
        counter = 0
        for peer in list_of_companies:
            counter += 1
            if base == peer:
                "do nothing"
            else:
                # Calculate first the denominator since we slice the big matrix
            # In dataframes that only have accessed the base firm
            denominator_df = df_matrix.loc[(df_matrix[base] == 1)]
            denominator = denominator_df.sum(axis=1).values.tolist()
            denominator = sum(denominator) - len(denominator)

            # Calculate the numerator. This is done later because
            # We slice up more the dataframe above by
            # Filtering records which have been accessed by both the base and the peer firm
            numerator_df = denominator_df.loc[(denominator_df[base] == 1) & (denominator_df[peer] == 1)]
            numerator = len(numerator_df.index)
            annual_search_fraction = numerator/denominator
            print("Base: {} and Peer: {} ==> {}".format(base, peer, annual_search_fraction))

EDIT 1 (added code explanation):

The metric is the following:

在此输入图像描述

1) The metric that I am trying to calculate is going to tell me how many times 2 companies are searched together in comparison with all the other searches.

2) The code is first selecting all the users which have accessed the base firm ( denominator_df = df_matrix.loc[(df_matrix[base] == 1)] )line. Then it calculates the denominator which counts how many unique combinations between the base firm and any other searched firm by the user are there and since I can count the number of firms accessed (by the user), I can subtract 1 to get the number of unique links between the base firm and the other firms.

3) Next, the code filters the previous denominator_df to select only the rows which accessed the base and the peer firm. Since I need to count the number of users which accessed the base and the peer firm, I use the command: numerator = len(numerator_df.index) to count the number of rows and that will give me the numerator.

The expected output from the dataframe at the top is the following:

Base: 100 and Peer: 200 ==> 0.5
Base: 100 and Peer: 300 ==> 0.25
Base: 100 and Peer: 400 ==> 0.25
Base: 200 and Peer: 100 ==> 0.5
Base: 200 and Peer: 300 ==> 0.25
Base: 200 and Peer: 400 ==> 0.25
Base: 300 and Peer: 100 ==> 0.5
Base: 300 and Peer: 200 ==> 0.5
Base: 300 and Peer: 400 ==> 0.0
Base: 400 and Peer: 100 ==> 0.5
Base: 400 and Peer: 200 ==> 0.5
Base: 400 and Peer: 300 ==> 0.0

4) The sanity check to see if the code gives the correct solution: all the metrics between 1 base firm and all the other peer firms have to sum up to 1. And they do in the code I posted

Any suggestions or tips on which direction to go will be appreciated!

You might be looking for itertools.product(). Here is an example that is similar to what you seem to want to do:

import itertools

a = [ 'one', 'two', 'three' ]

for b in itertools.product( a, a ):
    print( b )

The output from the above code snippet is:

('one', 'one')
('one', 'two')
('one', 'three')
('two', 'one')
('two', 'two')
('two', 'three')
('three', 'one')
('three', 'two')
('three', 'three')

Or you could do this:

for u,v in itertools.product( a, a ):
    print( "%s %s"%(u, v) )

The output is then,

one one
one two
one three
two one
two two
two three
three one
three two
three three

If you would like a list, you could do this:

alist = list( itertools.product( a, a ) ) )

print( alist )

And the output is,

[('one', 'one'), ('one', 'two'), ('one', 'three'), ('two', 'one'), ('two', 'two'), ('two', 'three'), ('three', 'one'), ('three', 'two'), ('three', 'three')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM