Create Pandas dataframe from JSON 基于嵌套列表构建

Question

I'd like to use the following nested list to:我想使用以下嵌套列表来：

First: create a dictionary第一：创建字典
Second: from the dictionary, create a Pandas dataframe第二：从字典中，创建一个 Pandas dataframe

structure=[['jumps', [['fox', [['The'], ['quick'], ['brown']]], ['over', [['dog', [['the'], ['lazy']]]]]]]] structure=[['jumps', [['fox', [['The'], ['quick'], ['brown']]], ['over', [['dog', [['the ']， ['懒惰的']]]]]]]]

This nested list comes from a parsed tree structure with dependencies:此嵌套列表来自具有依赖关系的已解析树结构：

          jumps              
       _____|________         
      |             over     
      |              |        
     fox            dog      
  ____|_____      ___|____    
The quick brown the      lazy

My idea was to transform this nested list in a JSON file, and after that create a Pandas dataframe looking like this one:我的想法是将这个嵌套列表转换为 JSON 文件，然后创建一个 Pandas dataframe 如下所示：

jumps fox
jumps over
fox The
fox quick
fox brown
over dog
dog the
dog lazy

So that this dataframe can be plotted with networkx .这样这个 dataframe 可以用networkx绘制。

I tried with json.dumps and dict without success so far.到目前为止，我尝试使用json.dumps和dict没有成功。

Any insights are appreciated.任何见解表示赞赏。

Answer 1

This is a tree-like structure, which makes me think that a recursive function should be used here.这是一个树状结构，这让我觉得这里应该使用递归的function。 Here's how I would do that:我会这样做：

import pandas as pd

def recurse(l, parent=None):
    assert isinstance(l, list)
    for item in l:
        if isinstance(item, str):
            if parent is not None:
                yield (parent, item)
            parent = item
        elif isinstance(item, list):
            yield from recurse(item, parent)
        else:
            raise Exception(f"Unknown type {type(item)}")

structure=[['jumps', [['fox', [['The'], ['quick'], ['brown']]], ['over', [['dog', [['the'], ['lazy']]]]]]]]

df = pd.DataFrame(recurse(structure), columns=['from', 'to'])

How it works: it goes through each list, remembering what the last item it saw was.它是如何工作的：它遍历每个列表，记住它看到的最后一个项目是什么。 For each list it finds, it calls itself with that list.对于它找到的每个列表，它都会用该列表调用自己。 The output of this function is a generator which yields a tuple for each "edge" in your graph.这个 function 的 output 是一个生成器，它为图中的每个“边”生成一个元组。 This can be imported into a pandas dataframe.这个可以导入一个pandas dataframe。

Output: Output：

    from     to
0  jumps    fox
1    fox    The
2    fox  quick
3    fox  brown
4  jumps   over
5   over    dog
6    dog    the
7    dog   lazy

Answer 2

I incorrectly read the question and assumed you wanted to plot the graph, instead of converting the nested list.我错误地阅读了这个问题并假设你想要 plot 图表，而不是转换嵌套列表。 @Nick's solution is the best way to go. Consider this answer only as additional information rather than a solution @Nick 的解决方案是 go 的最佳方式。将此答案仅视为附加信息而不是解决方案

Let's use graphviz and create our own DOT for the Digraph -让我们使用graphviz并为有向图创建我们自己的 DOT -

from graphviz import Source

l = [('jumps','fox'),
     ('jumps', 'over'),
     ('fox', 'The'),
     ('fox', 'quick'),
     ('fox', 'brown'),
     ('over', 'dog'),
     ('dog', 'the'),
     ('dog', 'lazy')]

dotgraph = 'digraph G {' + ' '.join([i+'->'+j for i,j in l]) + '}'
print(dotgraph)

s = Source(dotgraph, filename="test1.gv", format="png")
s.view()

digraph G {
    jumps->fox 
    jumps->over 
    fox->The 
    fox->quick 
    fox->brown 
    over->dog 
    dog->the 
    dog->lazy
}

You can play around with graphviz here on their visual editor.您可以在他们的可视化编辑器上使用graphviz 。 Also read the documentation for customization options on these graph elements and more complex graphs.另请阅读有关这些图形元素和更复杂图形的自定义选项的文档。

Create Pandas dataframe from JSON 基于嵌套列表构建

问题描述

2 个解决方案

解决方案1
3 已采纳 2021-01-15 22:41:47

解决方案2
1 2021-01-15 22:27:53

Create Pandas dataframe from JSON 基于嵌套列表构建

问题描述

2 个解决方案

解决方案1 3 已采纳 2021-01-15 22:41:47

解决方案2 1 2021-01-15 22:27:53

解决方案1
3 已采纳 2021-01-15 22:41:47

解决方案2
1 2021-01-15 22:27:53