I have a table that has two columns, 'parent' and 'child'. This is a download from SAP (ERP) for SETNODE table. Need to create a dataframe in python that has each level as it's own column in respect to it's parent and all levels before.
In python 3+.
There are an unknown (or always changing) number of levels for the full relationship so that max level can't always be defined. I would like to create a full dataframe table that shows ALL parent/child relationships for all levels. Right now it's about 15 levels but it can probably go up to 20 or more with other data I work with.
For example (example_df) of the two columns:
example_df = pd.DataFrame({'parent:['a','a','b','c','c','f'],'child':['b','c','d','f','g','h']})
To give output dataframe (solution_example):
solution_example = pd.DataFrame({'child':['h','f','d'],'parent_1':['a','a','a'],'parent_2':['c','c','b'],'parent_3':['f', 'none', 'none']})
This can be solved using the networkx
library. First, build a directed graph from the DataFrame, and then find all ancestors of the leaf nodes.
import networkx as nx
leaves = set(df.child).difference(df.parent)
g = nx.from_pandas_edgelist(df, 'parent', 'child', create_using=nx.DiGraph())
ancestors = {
n: nx.algorithms.dag.ancestors(g, n) for n in leaves
}
(pd.DataFrame.from_dict(ancestors, orient='index')
.rename(lambda x: 'parent_{}'.format(x+1), axis=1)
.rename_axis('child')
.fillna(''))
parent_1 parent_2 parent_3
child
h a c f
g a c
d a b
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.