简体   繁体   中英

Ordering a list of tuples by equality of the 1st element of one tuple and the 2nd element of another tuple

I have a list of tuples representing points (x, y) and want to order them such that if x_i of a point p_i is equal to y_j of another point p_j . The points are such that x and y are never repeating between the points, eg given the point (1,2), the points (1,y) or (x, 2) for any x and y are not allowed. For example:

points = [(1, 5), (3, 4), (5, 3), (4, 1), (7,2), (2, 6)]  # valid points

should be ordered as [(1, 5), (5, 3), (3, 4), (4, 1), (7, 2), (2, 6)]

Here is the code I wrote to do this:

N = len(points)
for i in range(N):
    for j in range(i + 1, N):
        if points[i][1] == points[j][0]:
            points.insert(i + 1, points.pop(j))
            break

Unfortunately the complexity of this is O(N^2) and for a big list of points it is slow. Is there a way to do this faster?

Thinking of your unordered list as the description of a directed graph where every node is in some unique chain, you can have the following abstraction.

points = [(1, 5), (3, 4), (5, 3), (4, 1), (7,2), (2, 6)]

# Create the graph and initialize the list of chains
graph, chains, seen = dict(points), [], set()

# Find the chains in the graph
for node, target in graph.items():
    while node not in seen:
        seen.add(node)
        chains.append((node, target))
        node = target
        try:
            target = graph[target]
        except KeyError:
            break

# chains : [(1, 5), (5, 3), (3, 4), (4, 1), (7, 2), (2, 6)]

This gives us an algorithm that runs in O(n) .

You can convert your searches to O(1) time by caching lists of points that have the same first term. (And the caching is O(N) time.) The code to do this gets a little tricky, mainly keeping track of which items have already been processed, but it should work pretty quickly. Here's an example:

from collections import defaultdict, deque

points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)]

# make a dictionary of lists of points, grouped by first element
cache = defaultdict(deque)
for i, p in enumerate(points):
    cache[p[0]].append(i)

# keep track of all points that will be processed
points_to_process = set(range(len(points)))

i = 0
next_idx = i
ordered_points = []
while i < len(points):
    # get the next point to be added to the ordered list
    cur_point = points[next_idx]
    ordered_points.append(cur_point)
    # remove this point from the cache (with popleft())
    # note: it will always be the first one in the corresponding list;
    # the assert just proves this and quietly consumes the popleft()
    assert next_idx == cache[cur_point[0]].popleft()
    points_to_process.discard(next_idx)
    # find the next item to add to the list
    try:
        # get the first remaining point that matches this
        next_idx = cache[cur_point[1]][0]
    except IndexError:
        # no matching point; advance to the next unprocessed one
        while i < len(points):
            if i in points_to_process:
                next_idx = i
                break
            else:
                i += 1

ordered_points
# [(1, 5), (5, 3), (3, 4), (4, 1), (1, 6), (7, 2), (2, 3), (3, 4)]

You can avoid creating the points_to_process set to save memory (and maybe time), but the code gets more complex:

from collections import defaultdict, deque

points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)]

# make a dictionary of lists of points, grouped by first element
cache = defaultdict(deque)
for i, p in enumerate(points):
    cache[p[0]].append(i)

i = 0
next_idx = i
ordered_points = []
while i < len(points):
    # get the next point to be added to the ordered list
    cur_point = points[next_idx]
    ordered_points.append(cur_point)
    # remove this point from the cache
    # note: it will always be the first one in the corresponding list
    assert next_idx == cache[cur_point[0]].popleft()
    # find the next item to add to the list
    try:
        next_idx = cache[cur_point[1]][0]
    except IndexError:
        # advance to the next unprocessed point
        while i < len(points):
            try:
                # see if i points to an unprocessed point (will always be first in list)
                assert i == cache[points[i][0]][0]
                next_idx = i
                break
            except (AssertionError, IndexError) as e:
                # no longer available, move on to next point
                i += 1

ordered_points
# [(1, 5), (5, 3), (3, 4), (4, 1), (1, 6), (7, 2), (2, 3), (3, 4)]

Thanks everyone for the help. Here is my own solution using numpy and a while loop (a lot slower than the solution by Matthias Fripp, but faster than using two for-loops as in the question's code):

# example of points
points = [(1, 5), (17, 2),(3, 4), (5, 3), (4, 1), (6, 8), (9, 7), (2, 6)]  

points = np.array(points)
x, y = points[:,0], points[:,1]

N = points.shape[0]
i = 0
idx = [0]
remaining = set(range(1, N))
while len(idx) < N: 
    try:
        i = np.where(x == y[i])[0][0]
        if i in remaining:
            remaining.remove(i)
        else:
            i = remaining.pop()
    except IndexError:
        i = remaining.pop()

    idx.append(i)

list(zip(points[idx][:,0], points[idx][:,1]))
# [(1, 5), (5, 3), (3, 4), (4, 1), (17, 2), (2, 6), (6, 8), (9, 7)]

A recursive divide-and-conquer approach may have a better runtime. Since this isn't really a straightforward sorting problem, you can't just throw together a modified quicksort or whatever. I think a good solution would be a merge algorithm. Here's some pseudocode that may help.

let points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)];
function tupleSort(tupleList):
    if length(tupleList) <= 1:
        return tupleList
    if length(tupleList) == 2:
        //Trivial solution. Only two tuples in the list. They are either
        //swapped or left in place
        if tupleList[0].x == tupleList[1].y
            return reverse(tupleList)
        else:
            return tupleList
    else:
        let length = length(tupleList)
        let firstHalf = tupleSort(tupleList[0 -> length/2])
        let secondHalf = tupleSort(tupleList[length/2 + 1 -> length])
        return merge(firstHalf, secondHalf) 

function merge(firstList, secondList):
    indexOfUnsorted = getNotSorted(firstList)
    if indexOfUnsorted > -1:
        //iterate through the second list and find a x item 
        //that matches the y of the first list and put the
        //second list into the first list at that position
        return mergedLists
    else:
        return append(firstList, secondList)

function getNotSorted(list):
     //iterate once through the list and return -1 if sorted
     //otherwise return the index of the first item whose y value
     //is not equal to the next items x value

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM