I have paired values in a csv file. Neither of the paired values are necessarily unique. I would like to split this large list into independent complete sets for further analysis.
To illustrate, my "megalist" is like:
megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']...]
Most importantly, the output would preserve the list of paired values (ie, not consolidate the values). Ideally, the output would eventually result in different csv files for individual analysis later. For example, this megalist would be:
completeset1 = [['a', 'b'], ['a', 'd'], ['b', 'd'], ['b', 'f']]
completeset2 = [['r', 's'], ['t', 'r']]
...
In a graph theory context, I'm trying to take a giant graph of mutually exclusive subgraphs (where the paired values are connected vertices) and split them into independent graphs that are more manageable. Thanks for any input!
Edit 1: This put me in a place from which I can move forward. Thanks again!
import sys, csv
import networkx as nx
megalist = csv.reader(open('megalistfile.csv'), delimiter = '\t')
G = nx.Graph()
G.add_edges_from(megalist)
subgraphs = nx.connected_components(G)
output_file = open('subgraphs.txt','w')
for subgraph in subgraphs:
output_line = str(G.edges(subgraph)) + '\n'
output_file.write(output_line)
output_file.close()
You can use networkx for this. Constructing the graph:
>>> import networkx as nx
>>> megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']]
>>> G = nx.Graph()
>>> G.add_edges_from(megalist)
Then to get the list of subgrahs:
>>> subgraphs = nx.connected_components(G)
>>> subgraphs
[['a', 'b', 'd', 'f'], ['s', 'r', 't']]
>>> [G.edges(subgraph) for subgraph in subgraphs]
[[('a', 'b'), ('a', 'd'), ('b', 'd'), ('b', 'f')], [('s', 'r'), ('r', 't')]]
very simple algo with Counter http://docs.python.org/library/collections.html#collections.Counter
from collections import Counter
megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']]
result = []
for l in megalist:
cl = Counter(l)
if not result:
result.append([l])
else:
add = False
for result_item in result:
add = bool(filter(lambda e: bool(cl & Counter(e)) , result_item))
if add and l not in result_item:
result_item.append(l)
break
if not add:
result.append([l])
print result
[[['a', 'b'], ['a', 'd'], ['b', 'd'], ['b', 'f']], [['r', 's'], ['t', 'r']]]
You could manually define your sublists using slicing:
completeset1=megalist[0:4]
completeset2=megalist[4:]
However, it really sounds like you'd like to apply some deeper logic, or use additional data, to create these segments automatically according to some condition. it's hard to advice without knowing more about what logic you'd like to apply.
Edit: the comments to the question may be good pointers.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.