简体   繁体   English

根据Python中数据框中的条件创建包含所有唯一可能组合的列表

[英]Create list with all unique possible combination based on condition in dataframe in Python

I have the following dataset: 我有以下数据集:

d = {
'Company':['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
'Individual': [1,2,3,4,1,5,6,7,1,8,9,10,10,11,12,13]
}

Now, I need to create a list in Python of all pairs of elements of 'Company', that correspond to the values in 'Individual'. 现在,我需要在Python中创建一个“公司”元素对的列表,它们对应于“个人”中的值。

Eg The output for above should be as follows for the dataset above: ((A,B),(A,C),(B,C),(C,D)).The first three tuples, since Individual 1 is affiliated with A,B and C and the last one since, Individual 10 is affiliated with C and D . 例如,对于上面的数据集,上面的输出应如下所示:((A,B),(A,C),(B,C),(C,D))。 前三个元组,因为个体1是附属的A,B和C以及最后一个,个人10隶属于C和D.

Further Explanation - If individual =1, the above dataset has 'A','B' and 'C' values. 进一步说明 - 如果individual = 1,则上述数据集具有“A”,“B”和“C”值。 Now, I want to create all unique combination of these three values (tuple), therefore it should create a list with the tuples (A,B),(A,C) and (B,C). 现在,我想创建这三个值(元组)的所有唯一组合,因此它应该创建一个包含元组(A,B),(A,C)和(B,C)的列表。 The next is Individual=2. 接下来是个人= 2。 Here is only has the value 'A' therefore there is no tuple to append to the list. 这里只有值'A',因此没有元组可以附加到列表中。 For next individuals there's only one corresponding company each, hence no further pairs. 对于下一个人,每个人只有一个相应的公司,因此没有进一步的配对。 The only other tuple that has to be added is for Individual=10, since it has values 'C' and 'D' - and should therefore add the tuple (C,D) to the list. 必须添加的唯一其他元组是Individual = 10,因为它具有值'C'和'D' - 因此应该将元组(C,D)添加到列表中。

One solution is to use pandas : 一种解决方案是使用pandas

import pandas as pd

d = {'Company':['A','A','A','B','B','B','C','C','C'],'Individual': [1,2,3,1,4,5,3,6,7]}

df = pd.DataFrame(d).groupby('Individual')['Company'].apply(list).reset_index()
companies = df.loc[df['Company'].map(len)>1, 'Company'].tolist()

# [['A', 'B'], ['A', 'C']]

This isn't the most efficient way, but it may be intuitive. 这不是最有效的方式,但它可能是直观的。

Try this, 试试这个,

temp=df[df.duplicated(subset=['Individual'],keep=False)]
print temp.groupby(['Individual'])['Company'].unique()

>>>1    [A, B]
>>>3    [A, C]

Here is a solution to your refined question: 以下是您提炼问题的解决方案:

from collections import defaultdict
from itertools import combinations

data = {'Company':['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
        'Individual': [1,2,3,4,1,5,6,7,1,8,9,10,10,11,12,13]}

d = defaultdict(set)

for i, j in zip(data['Individual'], data['Company']):
    d[i].add(j)

res = {k: sorted(map(sorted, combinations(v, 2))) for k, v in d.items()}

# {1: [['A', 'B'], ['A', 'C'], ['B', 'C']],
#  2: [],
#  3: [],
#  4: [],
#  5: [],
#  6: [],
#  7: [],
#  8: [],
#  9: [],
#  10: [['C', 'D']],
#  11: [],
#  12: [],
#  13: []}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM