简体   繁体   中英

Create list with all unique possible combination based on condition in dataframe in Python

I have the following dataset:

d = {
'Company':['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
'Individual': [1,2,3,4,1,5,6,7,1,8,9,10,10,11,12,13]
}

Now, I need to create a list in Python of all pairs of elements of 'Company', that correspond to the values in 'Individual'.

Eg The output for above should be as follows for the dataset above: ((A,B),(A,C),(B,C),(C,D)).The first three tuples, since Individual 1 is affiliated with A,B and C and the last one since, Individual 10 is affiliated with C and D .

Further Explanation - If individual =1, the above dataset has 'A','B' and 'C' values. Now, I want to create all unique combination of these three values (tuple), therefore it should create a list with the tuples (A,B),(A,C) and (B,C). The next is Individual=2. Here is only has the value 'A' therefore there is no tuple to append to the list. For next individuals there's only one corresponding company each, hence no further pairs. The only other tuple that has to be added is for Individual=10, since it has values 'C' and 'D' - and should therefore add the tuple (C,D) to the list.

One solution is to use pandas :

import pandas as pd

d = {'Company':['A','A','A','B','B','B','C','C','C'],'Individual': [1,2,3,1,4,5,3,6,7]}

df = pd.DataFrame(d).groupby('Individual')['Company'].apply(list).reset_index()
companies = df.loc[df['Company'].map(len)>1, 'Company'].tolist()

# [['A', 'B'], ['A', 'C']]

This isn't the most efficient way, but it may be intuitive.

Try this,

temp=df[df.duplicated(subset=['Individual'],keep=False)]
print temp.groupby(['Individual'])['Company'].unique()

>>>1    [A, B]
>>>3    [A, C]

Here is a solution to your refined question:

from collections import defaultdict
from itertools import combinations

data = {'Company':['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
        'Individual': [1,2,3,4,1,5,6,7,1,8,9,10,10,11,12,13]}

d = defaultdict(set)

for i, j in zip(data['Individual'], data['Company']):
    d[i].add(j)

res = {k: sorted(map(sorted, combinations(v, 2))) for k, v in d.items()}

# {1: [['A', 'B'], ['A', 'C'], ['B', 'C']],
#  2: [],
#  3: [],
#  4: [],
#  5: [],
#  6: [],
#  7: [],
#  8: [],
#  9: [],
#  10: [['C', 'D']],
#  11: [],
#  12: [],
#  13: []}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM