简体   繁体   中英

Pythonic way to create pairs of values in a column in dataframe

I have a Dataframe that looks like this:

OwnerID    Value
1            A
1            B
1            C
1            D

This is the shortened version, I have thousands of values for OwnerID . I'd like to create pairs for the Value column where each Value is paired with every other Value , and have the result as list of pairs.

For example, for the OwnerID 1, the resultset should be the following lists:

[A,B]

[A,C]

[A,D]

[B,C]

[B,D]

[C,D]

I could write 2 for loops to achieve this, but that wouldn't be very efficient or pythonic. Would someone know a better way to achieve this?

Any help would be much appreciated.

Pandas solution (using .merge() and .query() methods):

Data:

In [10]: df
Out[10]:
   OwnerID Value
0        1     A
1        1     B
2        1     C
3        1     D
4        2     X
5        2     Y
6        2     Z

Solution:

In [9]: pd.merge(df, df, on='OwnerID', suffixes=['','2']).query("Value != Value2")
Out[9]:
    OwnerID Value Value2
1         1     A      B
2         1     A      C
3         1     A      D
4         1     B      A
6         1     B      C
7         1     B      D
8         1     C      A
9         1     C      B
11        1     C      D
12        1     D      A
13        1     D      B
14        1     D      C
17        2     X      Y
18        2     X      Z
19        2     Y      X
21        2     Y      Z
22        2     Z      X
23        2     Z      Y

If you need only lists:

In [17]: pd.merge(df, df, on='OwnerID', suffixes=['','2']) \
           .query("Value != Value2") \
           .filter(like='Value').values
Out[17]:
array([['A', 'B'],
       ['A', 'C'],
       ['A', 'D'],
       ['B', 'A'],
       ['B', 'C'],
       ['B', 'D'],
       ['C', 'A'],
       ['C', 'B'],
       ['C', 'D'],
       ['D', 'A'],
       ['D', 'B'],
       ['D', 'C'],
       ['X', 'Y'],
       ['X', 'Z'],
       ['Y', 'X'],
       ['Y', 'Z'],
       ['Z', 'X'],
       ['Z', 'Y']], dtype=object)
import itertools as iter
df2 = df.groupby('OwnerID').Value.apply(lambda x: list(iter.combinations(x, 2)))

will return the desired output for each unique owner id

OwnerID
1    [(A, B), (A, C), (A, D), (B, C), (B, D), (C, D)]

itertools is all you need.

Depending on how you want to combine them, try either permutations or combinations , for example.

try itertools

import itertools

list(itertools.combinations(['a','b','c','d'], 2))

#result: [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
val = df['Value'].values
length = len(val)
pairs = [[val[i],val[j]] for i in xrange(length) for j in xrange(length) if i!=j]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM