简体   繁体   中英

Split a tuple of tuples (or list of lists) of paired values into independent complete sets?

I have paired values in a csv file. Neither of the paired values are necessarily unique. I would like to split this large list into independent complete sets for further analysis.

To illustrate, my "megalist" is like:

megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']...]

Most importantly, the output would preserve the list of paired values (ie, not consolidate the values). Ideally, the output would eventually result in different csv files for individual analysis later. For example, this megalist would be:

completeset1 = [['a', 'b'], ['a', 'd'], ['b', 'd'], ['b', 'f']]
completeset2 = [['r', 's'], ['t', 'r']]
...

In a graph theory context, I'm trying to take a giant graph of mutually exclusive subgraphs (where the paired values are connected vertices) and split them into independent graphs that are more manageable. Thanks for any input!

Edit 1: This put me in a place from which I can move forward. Thanks again!

import sys, csv
import networkx as nx

megalist = csv.reader(open('megalistfile.csv'), delimiter = '\t')

G = nx.Graph()
G.add_edges_from(megalist)

subgraphs = nx.connected_components(G)

output_file = open('subgraphs.txt','w')

for subgraph in subgraphs:
     output_line = str(G.edges(subgraph)) + '\n'
     output_file.write(output_line)

output_file.close()

You can use networkx for this. Constructing the graph:

>>> import networkx as nx
>>> megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']]
>>> G = nx.Graph()
>>> G.add_edges_from(megalist)

Then to get the list of subgrahs:

>>> subgraphs = nx.connected_components(G)
>>> subgraphs
[['a', 'b', 'd', 'f'], ['s', 'r', 't']]
>>> [G.edges(subgraph) for subgraph in subgraphs]
[[('a', 'b'), ('a', 'd'), ('b', 'd'), ('b', 'f')], [('s', 'r'), ('r', 't')]]

very simple algo with Counter http://docs.python.org/library/collections.html#collections.Counter

from collections import Counter

megalist = [['a', 'b'], ['a', 'd'], ['b', 'd'],['b', 'f'], ['r', 's'], ['t', 'r']]

result = []
for l in megalist:
    cl = Counter(l)
    if not result:
        result.append([l])
    else:
        add = False
        for result_item in result:
            add = bool(filter(lambda e: bool(cl & Counter(e)) , result_item))

            if add and l not in result_item:
                result_item.append(l)
                break                    

        if not add:
            result.append([l])


print result

[[['a', 'b'], ['a', 'd'], ['b', 'd'], ['b', 'f']], [['r', 's'], ['t', 'r']]]

You could manually define your sublists using slicing:

completeset1=megalist[0:4]
completeset2=megalist[4:]

However, it really sounds like you'd like to apply some deeper logic, or use additional data, to create these segments automatically according to some condition. it's hard to advice without knowing more about what logic you'd like to apply.

Edit: the comments to the question may be good pointers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM