简体   繁体   English

Create Pandas dataframe from JSON 基于嵌套列表构建

[英]Create Pandas dataframe from JSON build based on nested lists

I'd like to use the following nested list to:我想使用以下嵌套列表来:

  • First: create a dictionary第一:创建字典
  • Second: from the dictionary, create a Pandas dataframe第二:从字典中,创建一个 Pandas dataframe

structure=[['jumps', [['fox', [['The'], ['quick'], ['brown']]], ['over', [['dog', [['the'], ['lazy']]]]]]]] structure=[['jumps', [['fox', [['The'], ['quick'], ['brown']]], ['over', [['dog', [['the '], ['懒惰的']]]]]]]]

This nested list comes from a parsed tree structure with dependencies:此嵌套列表来自具有依赖关系的已解析树结构:

          jumps              
       _____|________         
      |             over     
      |              |        
     fox            dog      
  ____|_____      ___|____    
The quick brown the      lazy

My idea was to transform this nested list in a JSON file, and after that create a Pandas dataframe looking like this one:我的想法是将这个嵌套列表转换为 JSON 文件,然后创建一个 Pandas dataframe 如下所示:

jumps fox
jumps over
fox The
fox quick
fox brown
over dog
dog the
dog lazy

So that this dataframe can be plotted with networkx .这样这个 dataframe 可以用networkx绘制。

I tried with json.dumps and dict without success so far.到目前为止,我尝试使用json.dumpsdict没有成功。

Any insights are appreciated.任何见解表示赞赏。

This is a tree-like structure, which makes me think that a recursive function should be used here.这是一个树状结构,这让我觉得这里应该使用递归的function。 Here's how I would do that:我会这样做:

import pandas as pd

def recurse(l, parent=None):
    assert isinstance(l, list)
    for item in l:
        if isinstance(item, str):
            if parent is not None:
                yield (parent, item)
            parent = item
        elif isinstance(item, list):
            yield from recurse(item, parent)
        else:
            raise Exception(f"Unknown type {type(item)}")

structure=[['jumps', [['fox', [['The'], ['quick'], ['brown']]], ['over', [['dog', [['the'], ['lazy']]]]]]]]

df = pd.DataFrame(recurse(structure), columns=['from', 'to'])

How it works: it goes through each list, remembering what the last item it saw was.它是如何工作的:它遍历每个列表,记住它看到的最后一个项目是什么。 For each list it finds, it calls itself with that list.对于它找到的每个列表,它都会用该列表调用自己。 The output of this function is a generator which yields a tuple for each "edge" in your graph.这个 function 的 output 是一个生成器,它为图中的每个“边”生成一个元组。 This can be imported into a pandas dataframe.这个可以导入一个pandas dataframe。

Output: Output:

    from     to
0  jumps    fox
1    fox    The
2    fox  quick
3    fox  brown
4  jumps   over
5   over    dog
6    dog    the
7    dog   lazy

I incorrectly read the question and assumed you wanted to plot the graph, instead of converting the nested list.我错误地阅读了这个问题并假设你想要 plot 图表,而不是转换嵌套列表。 @Nick's solution is the best way to go. Consider this answer only as additional information rather than a solution @Nick 的解决方案是 go 的最佳方式。将此答案仅视为附加信息而不是解决方案

Let's use graphviz and create our own DOT for the Digraph -让我们使用graphviz并为有向图创建我们自己的 DOT -

from graphviz import Source

l = [('jumps','fox'),
     ('jumps', 'over'),
     ('fox', 'The'),
     ('fox', 'quick'),
     ('fox', 'brown'),
     ('over', 'dog'),
     ('dog', 'the'),
     ('dog', 'lazy')]

dotgraph = 'digraph G {' + ' '.join([i+'->'+j for i,j in l]) + '}'
print(dotgraph)

s = Source(dotgraph, filename="test1.gv", format="png")
s.view()
digraph G {
    jumps->fox 
    jumps->over 
    fox->The 
    fox->quick 
    fox->brown 
    over->dog 
    dog->the 
    dog->lazy
}

在此处输入图像描述

You can play around with graphviz here on their visual editor.您可以他们的可视化编辑器上使用graphviz Also read the documentation for customization options on these graph elements and more complex graphs.另请阅读有关这些图形元素和更复杂图形的自定义选项的文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM