Merging items in list given condition

Question

Let's say I have ['A B', 'B C', 'X Y', 'C D', 'Y Z', 'D E', 'C G'] .

If the second word in each element of the list is same as first word in any other elements in the list, they should be merged into one item. The order matters as well.

['AB C DE G', 'XY Z'] should be the final product.

Letters will not form a cycle, ie, ['A B', 'B C', 'C A'] is not possible.

As for why G was added at the end, let's say we are at 'AB C' and we see 'C D' . And later on in the list we have 'C G' . We add 'C D' since it comes first, so we have 'AB C D' , and then if there is 'D E' we merge that to 'AB C D E' . Now at the end of a 'chain', we add 'C G' so we have 'AB C DE G' . If there had been a 'B H' , since B comes first, then it would be 'AB C DEH G' .

Since order matters, ['A B', 'C A'] is still ['A B', 'C A'] since B does not connect to C.

Another example is:

If we have ['A B', 'A C'] then at step 1 we just have AB and then we see A C and we merge that into AB and we have AB C .

I have tried many things, such as dissecting each element into two lists, then indexing and removing etc. It is way too complex and I feel there should be a more intuitive way to solve this in Python. Alas, I can't seem to figure it out.

Answer 1

A simple algorithm solving this appears to be:

initialize results as empty list
repeat for each pair in input list:
- repeat for each sublist R in results :
  - if R contains the first item of pair , append second item to R and continue with next pair
- if no sublist R contained the first item of pair , append pair as new sublist to results

The implementation in Python is straightforward and should be self-explanatory:

def merge_pairs(pairs):
    results = []
    for pair in pairs:
        first, second = pair.split()
        for result in results:
            if first in result:
                result.append(second)
                break
        else:
            results.append([first, second])
    return [' '.join(result) for result in results]

The only extra steps are the conversions between space-separated letters and lists of letters by .split() and ' '.join() .

Note the else clause of the for statement which is only executed if no break was encountered.

Some examples:

>>> merge_pairs(['A B', 'B C', 'X Y', 'C D', 'Y Z', 'D E', 'C G'])
['A B C D E G', 'X Y Z']
>>> merge_pairs(['A B', 'A C'])
['A B C']
>>> merge_pairs(['B C', 'A B'])
['B C', 'A B']

Answer 2

bit messy but this works.

a = ['AB', 'BC', 'XY', 'CD', 'YZ', 'DE', 'CG']

used_item = []
result = []
for f, fword in enumerate(a):
    for s, sword in enumerate(a):
        if f == s:
            continue
        if fword[1] == sword[0]:
            if f in used_item or s in used_item:
                idx = [i for i, w in enumerate(result) if fword[1] in w][0]
                result = [r + sword[1] if i == idx else r for i, r in enumerate(result) ]  
                used_item.append(f)
                used_item.append(s)
          
          
            else:
                result.append(fword+sword[1])
                used_item.append(f)
                used_item.append(s)
                
print(result)

output is ['ABCDGE', 'XYZ'] you could sort 'ABCDGE' if necessary.

Answer 3

I appreciate all the above answers and wanted to attempt the question using Graph Theory.

This question is a classic example of connected components , where we can assign a unique identifier to each node (nodes are the different characters in the list = A, B, C, .... Z) accordingly.

Also, as we need to maintain the order, DFS is the correct choice to proceed with.

Algorithm:
    1. Treat the characters A, B, C, .... Z as different 26 nodes.
    2. Perform DFS on each node.
            2.1 If the node has not been assigned an unique identifier:
                    2.1.1. Assign a new unique identifier to the node.
                    2.1.2. dfs(node)
                    
            2.2 Else:
                    no nothing

The function dfs() will group the nodes according to the unique identifier . While performing DFS , the child gets the same unique identifier as its parent.

Have a look at the following implementation:

class Graph:
    
    def __init__(self, n):
        self.number_of_nodes = n
        self.visited = []
        self.adjacency_list = []
        self.identifier = [-1] * self.number_of_nodes
        self.merged_list = []
    
    def add_edge(self, edge_start, edge_end):
        if(self.encode(edge_end) not in self.adjacency_list[self.encode(edge_start)]):
            self.adjacency_list[self.encode(edge_start)] = self.adjacency_list[self.encode(edge_start)] + [self.encode(edge_end)]
        
    def initialize_graph(self):
        self.visited = [False] * self.number_of_nodes
        for i in range(0, self.number_of_nodes):
            self.adjacency_list = self.adjacency_list + [[]]

    def get_adjacency_list(self):
        return self.adjacency_list

    def encode(self, node):
        return ord(node) - 65
    
    def decode(self, node):
        return chr(node + 65)
    
    def dfs(self, start_index):
    
        if(self.visited[self.encode(start_index)] == True):
            return
    
        self.visited[self.encode(start_index)] = True
        
        for node in self.adjacency_list[self.encode(start_index)]:
            
            if(self.identifier[node] == -1):
                self.identifier[node] = self.identifier[self.encode(start_index)]
            
            if(self.visited[node] == False):
                self.merged_list[self.identifier[node]] = self.merged_list[self.identifier[node]] + self.decode(node)
                
                self.dfs(self.decode(node))                

graph = Graph(26)
graph.initialize_graph()

input_list = ['A B', 'B C', 'X Y', 'C D', 'Y Z', 'D E', 'C G']

for inputs in input_list:
    edge = inputs.split()
    edge_start = edge[0]
    edge_end = edge[1]
    graph.add_edge(edge_start, edge_end)
    
unique_identifier = 0

for inputs in input_list:
    edge = inputs.split()
    edge_start = edge[0]
    
    if(graph.identifier[graph.encode(edge_start)] == -1):
        graph.identifier[graph.encode(edge_start)] = unique_identifier
        graph.merged_list = graph.merged_list + [edge_start]
        unique_identifier = unique_identifier + 1
        graph.dfs(edge_start)

print(graph.merged_list)

Output:

['ABCDEG', 'XYZ']

Merging items in list given condition

Question

3 answers

solution1
1 ACCPTED 2021-02-08 13:31:12

solution2
1 2021-02-08 14:01:34

solution3
1 2021-02-08 14:26:07

Merging items in list given condition

Question

3 answers

solution1 1 ACCPTED 2021-02-08 13:31:12

solution2 1 2021-02-08 14:01:34

solution3 1 2021-02-08 14:26:07

solution1
1 ACCPTED 2021-02-08 13:31:12

solution2
1 2021-02-08 14:01:34

solution3
1 2021-02-08 14:26:07