Let's say I have ['A B', 'B C', 'X Y', 'C D', 'Y Z', 'D E', 'C G']
.
If the second word in each element of the list is same as first word in any other elements in the list, they should be merged into one item. The order matters as well.
['AB C DE G', 'XY Z']
should be the final product.
Letters will not form a cycle, ie, ['A B', 'B C', 'C A']
is not possible.
As for why G
was added at the end, let's say we are at 'AB C'
and we see 'C D'
. And later on in the list we have 'C G'
. We add 'C D'
since it comes first, so we have 'AB C D'
, and then if there is 'D E'
we merge that to 'AB C D E'
. Now at the end of a 'chain', we add 'C G'
so we have 'AB C DE G'
. If there had been a 'B H'
, since B
comes first, then it would be 'AB C DEH G'
.
Since order matters, ['A B', 'C A']
is still ['A B', 'C A']
since B does not connect to C.
Another example is:
If we have ['A B', 'A C']
then at step 1 we just have AB
and then we see A C
and we merge that into AB
and we have AB C
.
I have tried many things, such as dissecting each element into two lists, then indexing and removing etc. It is way too complex and I feel there should be a more intuitive way to solve this in Python. Alas, I can't seem to figure it out.
A simple algorithm solving this appears to be:
The implementation in Python is straightforward and should be self-explanatory:
def merge_pairs(pairs):
results = []
for pair in pairs:
first, second = pair.split()
for result in results:
if first in result:
result.append(second)
break
else:
results.append([first, second])
return [' '.join(result) for result in results]
The only extra steps are the conversions between space-separated letters and lists of letters by .split()
and ' '.join()
.
Note the else
clause of the for
statement which is only executed if no break
was encountered.
Some examples:
>>> merge_pairs(['A B', 'B C', 'X Y', 'C D', 'Y Z', 'D E', 'C G'])
['A B C D E G', 'X Y Z']
>>> merge_pairs(['A B', 'A C'])
['A B C']
>>> merge_pairs(['B C', 'A B'])
['B C', 'A B']
bit messy but this works.
a = ['AB', 'BC', 'XY', 'CD', 'YZ', 'DE', 'CG']
used_item = []
result = []
for f, fword in enumerate(a):
for s, sword in enumerate(a):
if f == s:
continue
if fword[1] == sword[0]:
if f in used_item or s in used_item:
idx = [i for i, w in enumerate(result) if fword[1] in w][0]
result = [r + sword[1] if i == idx else r for i, r in enumerate(result) ]
used_item.append(f)
used_item.append(s)
else:
result.append(fword+sword[1])
used_item.append(f)
used_item.append(s)
print(result)
output is ['ABCDGE', 'XYZ'] you could sort 'ABCDGE' if necessary.
I appreciate all the above answers and wanted to attempt the question using Graph Theory.
This question is a classic example of connected components , where we can assign a unique identifier
to each node (nodes are the different characters in the list = A, B, C, .... Z) accordingly.
Also, as we need to maintain the order, DFS is the correct choice to proceed with.
Algorithm:
1. Treat the characters A, B, C, .... Z as different 26 nodes.
2. Perform DFS on each node.
2.1 If the node has not been assigned an unique identifier:
2.1.1. Assign a new unique identifier to the node.
2.1.2. dfs(node)
2.2 Else:
no nothing
The function dfs()
will group the nodes according to the unique identifier
. While performing DFS , the child gets the same unique identifier
as its parent.
Have a look at the following implementation:
class Graph:
def __init__(self, n):
self.number_of_nodes = n
self.visited = []
self.adjacency_list = []
self.identifier = [-1] * self.number_of_nodes
self.merged_list = []
def add_edge(self, edge_start, edge_end):
if(self.encode(edge_end) not in self.adjacency_list[self.encode(edge_start)]):
self.adjacency_list[self.encode(edge_start)] = self.adjacency_list[self.encode(edge_start)] + [self.encode(edge_end)]
def initialize_graph(self):
self.visited = [False] * self.number_of_nodes
for i in range(0, self.number_of_nodes):
self.adjacency_list = self.adjacency_list + [[]]
def get_adjacency_list(self):
return self.adjacency_list
def encode(self, node):
return ord(node) - 65
def decode(self, node):
return chr(node + 65)
def dfs(self, start_index):
if(self.visited[self.encode(start_index)] == True):
return
self.visited[self.encode(start_index)] = True
for node in self.adjacency_list[self.encode(start_index)]:
if(self.identifier[node] == -1):
self.identifier[node] = self.identifier[self.encode(start_index)]
if(self.visited[node] == False):
self.merged_list[self.identifier[node]] = self.merged_list[self.identifier[node]] + self.decode(node)
self.dfs(self.decode(node))
graph = Graph(26)
graph.initialize_graph()
input_list = ['A B', 'B C', 'X Y', 'C D', 'Y Z', 'D E', 'C G']
for inputs in input_list:
edge = inputs.split()
edge_start = edge[0]
edge_end = edge[1]
graph.add_edge(edge_start, edge_end)
unique_identifier = 0
for inputs in input_list:
edge = inputs.split()
edge_start = edge[0]
if(graph.identifier[graph.encode(edge_start)] == -1):
graph.identifier[graph.encode(edge_start)] = unique_identifier
graph.merged_list = graph.merged_list + [edge_start]
unique_identifier = unique_identifier + 1
graph.dfs(edge_start)
print(graph.merged_list)
Output:
['ABCDEG', 'XYZ']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.