I have a list of tuples representing points (x, y)
and want to order them such that if x_i
of a point p_i
is equal to y_j
of another point p_j
. The points are such that x and y are never repeating between the points, eg given the point (1,2), the points (1,y) or (x, 2) for any x and y are not allowed. For example:
points = [(1, 5), (3, 4), (5, 3), (4, 1), (7,2), (2, 6)] # valid points
should be ordered as [(1, 5), (5, 3), (3, 4), (4, 1), (7, 2), (2, 6)]
Here is the code I wrote to do this:
N = len(points)
for i in range(N):
for j in range(i + 1, N):
if points[i][1] == points[j][0]:
points.insert(i + 1, points.pop(j))
break
Unfortunately the complexity of this is O(N^2) and for a big list of points it is slow. Is there a way to do this faster?
Thinking of your unordered list as the description of a directed graph where every node is in some unique chain, you can have the following abstraction.
points = [(1, 5), (3, 4), (5, 3), (4, 1), (7,2), (2, 6)]
# Create the graph and initialize the list of chains
graph, chains, seen = dict(points), [], set()
# Find the chains in the graph
for node, target in graph.items():
while node not in seen:
seen.add(node)
chains.append((node, target))
node = target
try:
target = graph[target]
except KeyError:
break
# chains : [(1, 5), (5, 3), (3, 4), (4, 1), (7, 2), (2, 6)]
This gives us an algorithm that runs in O(n) .
You can convert your searches to O(1) time by caching lists of points that have the same first term. (And the caching is O(N) time.) The code to do this gets a little tricky, mainly keeping track of which items have already been processed, but it should work pretty quickly. Here's an example:
from collections import defaultdict, deque
points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)]
# make a dictionary of lists of points, grouped by first element
cache = defaultdict(deque)
for i, p in enumerate(points):
cache[p[0]].append(i)
# keep track of all points that will be processed
points_to_process = set(range(len(points)))
i = 0
next_idx = i
ordered_points = []
while i < len(points):
# get the next point to be added to the ordered list
cur_point = points[next_idx]
ordered_points.append(cur_point)
# remove this point from the cache (with popleft())
# note: it will always be the first one in the corresponding list;
# the assert just proves this and quietly consumes the popleft()
assert next_idx == cache[cur_point[0]].popleft()
points_to_process.discard(next_idx)
# find the next item to add to the list
try:
# get the first remaining point that matches this
next_idx = cache[cur_point[1]][0]
except IndexError:
# no matching point; advance to the next unprocessed one
while i < len(points):
if i in points_to_process:
next_idx = i
break
else:
i += 1
ordered_points
# [(1, 5), (5, 3), (3, 4), (4, 1), (1, 6), (7, 2), (2, 3), (3, 4)]
You can avoid creating the points_to_process
set to save memory (and maybe time), but the code gets more complex:
from collections import defaultdict, deque
points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)]
# make a dictionary of lists of points, grouped by first element
cache = defaultdict(deque)
for i, p in enumerate(points):
cache[p[0]].append(i)
i = 0
next_idx = i
ordered_points = []
while i < len(points):
# get the next point to be added to the ordered list
cur_point = points[next_idx]
ordered_points.append(cur_point)
# remove this point from the cache
# note: it will always be the first one in the corresponding list
assert next_idx == cache[cur_point[0]].popleft()
# find the next item to add to the list
try:
next_idx = cache[cur_point[1]][0]
except IndexError:
# advance to the next unprocessed point
while i < len(points):
try:
# see if i points to an unprocessed point (will always be first in list)
assert i == cache[points[i][0]][0]
next_idx = i
break
except (AssertionError, IndexError) as e:
# no longer available, move on to next point
i += 1
ordered_points
# [(1, 5), (5, 3), (3, 4), (4, 1), (1, 6), (7, 2), (2, 3), (3, 4)]
Thanks everyone for the help. Here is my own solution using numpy and a while loop (a lot slower than the solution by Matthias Fripp, but faster than using two for-loops as in the question's code):
# example of points
points = [(1, 5), (17, 2),(3, 4), (5, 3), (4, 1), (6, 8), (9, 7), (2, 6)]
points = np.array(points)
x, y = points[:,0], points[:,1]
N = points.shape[0]
i = 0
idx = [0]
remaining = set(range(1, N))
while len(idx) < N:
try:
i = np.where(x == y[i])[0][0]
if i in remaining:
remaining.remove(i)
else:
i = remaining.pop()
except IndexError:
i = remaining.pop()
idx.append(i)
list(zip(points[idx][:,0], points[idx][:,1]))
# [(1, 5), (5, 3), (3, 4), (4, 1), (17, 2), (2, 6), (6, 8), (9, 7)]
A recursive divide-and-conquer approach may have a better runtime. Since this isn't really a straightforward sorting problem, you can't just throw together a modified quicksort or whatever. I think a good solution would be a merge algorithm. Here's some pseudocode that may help.
let points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)];
function tupleSort(tupleList):
if length(tupleList) <= 1:
return tupleList
if length(tupleList) == 2:
//Trivial solution. Only two tuples in the list. They are either
//swapped or left in place
if tupleList[0].x == tupleList[1].y
return reverse(tupleList)
else:
return tupleList
else:
let length = length(tupleList)
let firstHalf = tupleSort(tupleList[0 -> length/2])
let secondHalf = tupleSort(tupleList[length/2 + 1 -> length])
return merge(firstHalf, secondHalf)
function merge(firstList, secondList):
indexOfUnsorted = getNotSorted(firstList)
if indexOfUnsorted > -1:
//iterate through the second list and find a x item
//that matches the y of the first list and put the
//second list into the first list at that position
return mergedLists
else:
return append(firstList, secondList)
function getNotSorted(list):
//iterate once through the list and return -1 if sorted
//otherwise return the index of the first item whose y value
//is not equal to the next items x value
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.