简体   繁体   中英

Compare two lists of combinations in Pandas

I have a list of 60.000 lottery draws (5 numbers between 1 and 36), and would like to compare them against every possible combination (376.992 combinations of 36 elements taken 5 at a time), and summarize statistics of each outcome, ie for each possible combination obtain the number of 0 matches, the number of 1 single match and so on

So I'm starting with a Pandas dataframe with all possibile combinations, which I obtain with this command:

pd.DataFrame(itertools.combinations(range(1, 37), 5))

And I'd like to add 6 more columns to show how many times (against the 60.000 draws) each row (combination) would have got 0, 1, 2, 3, 4 or 5 matches. I realize it's an extremely heavy calculation, so I'd like to know how you would approach this problem for maximum speed (and if it's too much anyway and should be done in much smaller chunks, maybe 1000 draws at a time or something). The list of draws could be a... list, a DataFrame itself or whatever else you think is better. I understand from similar questions that maybe the fastest way to get the number of matching elements between two lists is by

common_elements = len(set(list1).intersection(list2))

But can't move much further than this. Thanks!

You can possibly first create a list of all the possible combinations using the itertools.combinations function and then use a for loop to iterate over the list of lottery draws.

import itertools
import pandas as pd

# Create a list of all the possible combinations
combinations = list(itertools.combinations(range(1, 37), 5))

# Create a DF with the combinations with counter
df = pd.DataFrame(combinations, columns=['combination'])
df['0_matches'] = 0
df['1_match'] = 0
df['2_matches'] = 0
df['3_matches'] = 0
df['4_matches'] = 0
df['5_matches'] = 0

# Iterate over draws
for draw in draws:
    # Find the common elements between the draw and the combination
    common_elements = len(set(draw).intersection(combination))
    
    # Increment the appropriate counter
    if common_elements == 0:
        df.loc[df['combination'] == combination, '0_matches'] += 1
    elif common_elements == 1:
        df.loc[df['combination'] == combination, '1_match'] += 1
    elif common_elements == 2:
        df.loc[df['combination'] == combination, '2_matches'] += 1
    elif common_elements == 3:
        df.loc[df['combination'] == combination, '3_matches'] += 1
    elif common_elements == 4:
        df.loc[df['combination'] == combination, '4_matches'] += 1
    elif common_elements == 5:
        df.loc[df['combination'] == combination, '5_matches'] += 1

# The df DataFrame now contains the summary statistics for each combination

Is it any helpful?

(for se un po' in ritardo per la tombola di capodanno, haha!)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM