繁体   English   中英

CSV文件中的树结构

[英]Tree construction from csv file

我有一个csv文件,并想通过读取文件内容来构建树

id  | screen_name |    reply_status_id |    tweet
1   |      a      |        null        |     dahgfsjhg
2   |      b      |         1          |     fcjgvujhgjhk
3   |      c      |         2          |     ououoijoskjfpokpo
4   |      d      |         1          |     giuyhewikuhieuhi
5   |      e      |         3          |     hkjhkjlkjljlkjlj

我想基于id和带有tweet reply_status_id创建树结构。

喜欢,

      a [root]
     / \
    b   d  [childs]
   /
  c
 /
e

到目前为止,我的代码:

with open(file_path) as inp:
    csv_reader = csv.reader(inp)
    for row in csv_reader:
        if row[2] =='null':
            if visited == '0':
                root = Node(row[3])
                id_root = row[0]
                #inp.seek(0)
                visited = '1'
        if row[2] ==id_root:
            child = Node(row[3],root)
            child_id = row[0]

如果reply_staus_id == null则将screen_name保留为root。 然后在下一行中,如果回复状态ID =任何ID,则将其保留为该ID的子代。 通过重复过程为文件构造完整的树。

您可以使用anytree lib创建图:

import csv
from anytree import Node
from anytree.exporter import DotExporter

def find_subnodes(root_node, root_node_id, nodes):
    for row in lst:
        node_id = row[0]
        # name = regex.sub('', row[3])
        name = row[3].replace('\\"', '\'').replace('"', '')
        parent_node_id = row[2]
        if root_node_id == parent_node_id:
            node = Node(name, root_node)
            nodes[node_id] = node
            nodes = find_subnodes(node, node_id, nodes)
    return nodes

with open('rumour1.csv') as f:
    reader = csv.reader(f)
    next(reader)
    lst = list(reader)
r_node = Node(lst[0][3].replace('\\"', '\'').replace('"', ''))
n = {lst[0][0]: r_node}
n = find_subnodes(r_node, lst[0][0], n)
DotExporter(r_node).to_picture('tree.png')  # graphviz required

根据该CSV,您将获得:

在此处输入图片说明

您可以将递归与一个简单的类一起使用:

import csv
_, *data = csv.reader(open('filename.csv'))
new_data = [[a, b, c if not c.isdigit() else int(c), *d] for a, b, c, *d in data]
class Tree:
  def __init__(self, _d, _start='null'):
     self.head, _next = [i for i in _d if i[2] == _start], 1 if _start == 'null' else _start+1
     self.children = (lambda x:None if not x else Tree(_d, _next))([i for i in _d if i[2] == _next])

现在, Tree创建了一个结构,该结构按reply_status_id指定的“级别”存储推文:

d = Tree(new_data)
print(d.head)
print(d.children.head)
print(d.children.children.head)
print(d.children.children.children.head)

输出:

[['1', 'a', 'null', 'dahgfsjhg']]
[['2', 'b', 1, 'fcjgvujhgjhk'], ['4', 'd', 1, 'giuyhewikuhieuhi']]
[['3', 'c', 2, 'ououoijoskjfpokpo']]
[['5', 'e', 3, 'hkjhkjlkjljlkjlj']]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM