The data that I have is actually contained in pandas dataframe (on a column) but for the sake of this post, we extract it to get to the nub of the problem.
Suppose we have a dataframe df
with a column col1
which we store as a list: L = df.col1.tolist()
. Now, I have about 2000 of these columns/lists and on average they have a length of about 300-400. So there is no massive need for performance here.
Back to our MWE list, it is structured with items like this (ish):
L = [1,2,2,1,3,3,4,4,5,5,6,6,1,2,1,2,7,7,8,8]
Now the way the items in the list should be structured is that of consecutive pairs (but for data-collection reasons, they are not). So here is the sorted list we are aiming for:
L = [1,1,2,2,3,3,4,4,5,5,6,6,1,1,2,2,7,7,8,8]
I have added these as tuples just for clarity:
L = [(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(1,1),(2,2),(7,7),(8,8)]
This the problem: the columns contain almost sequential pairs of items (the numbers in the above example) but some of them are out of order and have to be moved back to their partner (see above).
A few things to observe:
I am looking for a method that can sort these lists/columns into the required pair-sequential order. Thanks!
OK, since you can guarantee that they are always paired, I'd just keep a running count and you basically just need to generate a list of the elements in the order that the first item in the pair is encountered (so when the count is equal to zero), and when the count gets to 2, reset the count for that item. Then just "explode" this list of the first elements in order into a list of the pairs, so quick and dirty:
In [1]: L = [1,2,2,1,3,3,4,4,5,5,6,6,1,2,1,2,7,7,8,8]
In [2]: from collections import Counter
In [3]: counts = Counter()
In [4]: order = []
In [5]: for x in L:
...: n = counts[x]
...: if n == 0:
...: order.append(x)
...: counts[x] += 1
...: elif n == 2:
...: counts[x] = 0
...: else:
...: counts[x] += 1
...:
In [6]: order
Out[6]: [1, 2, 3, 4, 5, 6, 1, 2, 7, 8]
In [7]: result = []
In [8]: for x in order:
...: result.append(x)
...: result.append(x)
...:
In [9]: result
Out[9]: [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 1, 1, 2, 2, 7, 7, 8, 8]
Of course, you should make a function to do this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.