[英]How to convert pandas DataFrame to dictionary for Newick format
我有以下數據集:
import pandas as pd
df = pd.DataFrame([['root', 'b', 'a', 'leaf1'],
['root', 'b', 'a', 'leaf2'],
['root', 'b', 'leaf3', ''],
['root', 'b', 'leaf4', ''],
['root', 'c', 'leaf5', ''],
['root', 'c', 'leaf6', '']],
columns=['col1', 'col2', 'col3', 'col4'])
因為沒找到直接轉成Newic格式的方法,所以想轉成字典,格式如下:
node_to_children = {
'root': {'b': 0, 'c': 0},
'a': {'leaf1': 0, 'leaf2': 0},
'b': {'a': 0, 'leaf3': 0, 'leaf4': 0},
'c': {'leaf5': 0, 'leaf6': 0}
}
然后我最終可以將此 node_to_children 轉換為 Newic 格式,但是,如何將 pandas DataFrame 轉換為字典?
我假設您的 dataframe 中的每一行都代表樹從根到葉的一個完整分支。 基於此,我想出了以下解決方案。 可以在下面的代碼中找到對算法中每個步驟的注釋,但如果有任何不清楚的地方,請隨時詢問。
node_to_children = {}
#iterate over dataframe row-wise. Assuming that every row stands for one complete branch of the tree
for row in df.itertuples():
#remove index at position 0 and elements that contain no child ("")
row_list = [element for element in row[1:] if element != ""]
for i in range(len(row_list)-1):
if row_list[i] in node_to_children.keys():
#parent entry already existing
if row_list[i+1] in node_to_children[row_list[i]].keys():
#entry itself already existing --> next
continue
else:
#entry not existing --> update dict and add the connection
node_to_children[row_list[i]].update({row_list[i+1]:0})
else:
#add the branching point
node_to_children[row_list[i]] = {row_list[i+1]:0}
Output:
print(node_to_children)
{'root': {'b': 0, 'c': 0},
'b': {'a': 0, 'leaf3': 0, 'leaf4': 0},
'a': {'leaf1': 0, 'leaf2': 0},
'c': {'leaf5': 0, 'leaf6': 0}}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.