简体   繁体   中英

Python create a lists from a pandas dataframe of numbers that are repeated down the dataframe

I have a pandas dataframe with 2 columns. As you can see in row 0 we have a 1 and a 2 the first list. Column_1 does not have any rows with 2 so that would be the first list. In row 1 you have 4 and 6, 6 is in row 3 with 9, 9 is on line 4 with 15 and in row 6 we have 15 and 14, so the list would be [4,6,9,15,14]. So forth and so on.

df
   Column_1   Column_2
0  1          2
1  4          6
2  5          8
3  6          9
4  9          15
5  11         22
6  15         14

I am looking to create a list of lists like the following from the df above:

list1
[[1,2], [4,6,9,15,14], [5,8], [11,22]] 

IIUC, this is a connected component problem. So check out networkx . Here's a solution:

import networkx as nx

G = nx.Graph()
G.add_edges_from([(a,b) for a,b in zip(df['Column_1'], df['Column_2'])])
list(nx.connected_components(G))

output:

[{1, 2}, {4, 6, 9, 14, 15}, {5, 8}, {11, 22}]

(not list of list, but I assume that will do)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM