Lets say I have a series/dataframe A
that looks like
A = [3,2,1,5,4,...
A
could also be sorted as it doesn't matter to me. I want to create a new series that keeps track of possible pairs. That is, I want the result to look like
B = [3_1, 3_2, 3_4, ..., 2_1, 2_4, ..., 1_4, 1_5,...
That is, I want to exclude 2_3
, since 3_2
already exists. I figure I could create each element in B
using something like
for i in A:
for j in A:
s = A[i].astype(str) + '_' + A[j].astype(str)
B.append(pd.Series([s]))
But I'm not sure how to make sure the (i,j) pairing doesn't already exist, such as making sure 2_3
doesn't get added as I mentioned above
What is the most efficient way to deal with this?
from itertools import combinations
s = pd.Series([1, 2, 3, 4])
s2 = pd.Series("_".join([str(a), str(b)]) for a, b in combinations(s, 2))
>>> s2
0 1_2
1 1_3
2 1_4
3 2_3
4 2_4
5 3_4
dtype: object
I don't think this really has much to do with pandas, except for the values originating (and possibly ending) in a series. Instead, I'd use itertools
Say you have an iterable a
of values. Then
import itertools
set((str(i) + '_' + str(j)) for (i, j) in itertools.product(a, a) if i <= j)
will create a set of pairs where the integer before the _
is not larger than that after that, removing duplicates.
Example
import itertools
>>> set((str(i) + '_' + str(j)) for (i, j) in itertools.product(a, a) if i < j)
{'1_2',
'1_3',
'1_4',
'1_6',
'1_7',
'2_3',
'2_4',
'2_6',
'2_7',
'3_4',
'3_6',
'3_7',
'4_6',
'4_7',
'6_7'}
This can be done via a list comprehension:
>>> a = [3, 2, 1, 5, 4]
>>> [(str(x)+'_'+str(y)) for x in a for y in a if y>x]
['3_5', '3_4', '2_3', '2_5', '2_4', '1_3', '1_2', '1_5', '1_4', '4_5']
Note that the ordering of the members in the pairs in the result is sorted because of the y>x
statement, which is why we have '1_3'
in our output instead of '3_1'
.
While importing itertools and using combinations is a correct way to do this, I usually prefer not to import libraries if I only need one one or two things from them that can also be easily accomplished via direct means.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.