简体   繁体   中英

Sorting consecutive pairs of items in a python list

The data that I have is actually contained in pandas dataframe (on a column) but for the sake of this post, we extract it to get to the nub of the problem.

Suppose we have a dataframe df with a column col1 which we store as a list: L = df.col1.tolist() . Now, I have about 2000 of these columns/lists and on average they have a length of about 300-400. So there is no massive need for performance here.

Back to our MWE list, it is structured with items like this (ish):

L = [1,2,2,1,3,3,4,4,5,5,6,6,1,2,1,2,7,7,8,8]

Now the way the items in the list should be structured is that of consecutive pairs (but for data-collection reasons, they are not). So here is the sorted list we are aiming for:

L = [1,1,2,2,3,3,4,4,5,5,6,6,1,1,2,2,7,7,8,8]

I have added these as tuples just for clarity:

L = [(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(1,1),(2,2),(7,7),(8,8)]

This the problem: the columns contain almost sequential pairs of items (the numbers in the above example) but some of them are out of order and have to be moved back to their partner (see above).

A few things to observe:

  • The above list contains numbers, in actuality, we are dealing with strings
  • The data typically lives on a column in a pandas dataframe (not sure if this helps but it may)
  • Performance is not really a problem since they will only need to be sorted once
  • The out-of-order pattern is not consistent and things move around a lot in each column, what is important is that each item is mapped back to its partner.

I am looking for a method that can sort these lists/columns into the required pair-sequential order. Thanks!

OK, since you can guarantee that they are always paired, I'd just keep a running count and you basically just need to generate a list of the elements in the order that the first item in the pair is encountered (so when the count is equal to zero), and when the count gets to 2, reset the count for that item. Then just "explode" this list of the first elements in order into a list of the pairs, so quick and dirty:

In [1]: L = [1,2,2,1,3,3,4,4,5,5,6,6,1,2,1,2,7,7,8,8]

In [2]: from collections import Counter

In [3]: counts = Counter()

In [4]: order = []

In [5]: for x in L:
   ...:     n = counts[x]
   ...:     if n == 0:
   ...:         order.append(x)
   ...:         counts[x] += 1
   ...:     elif n == 2:
   ...:         counts[x] = 0
   ...:     else:
   ...:         counts[x] += 1
   ...:

In [6]: order
Out[6]: [1, 2, 3, 4, 5, 6, 1, 2, 7, 8]

In [7]: result = []

In [8]: for x in order:
   ...:     result.append(x)
   ...:     result.append(x)
   ...:

In [9]: result
Out[9]: [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 1, 1, 2, 2, 7, 7, 8, 8]

Of course, you should make a function to do this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM