简体   繁体   English

使用python生成分层数据

[英]Generate hierarchical data using python

I need to create Directory and sub-directory structure for a MxN size ( M levels , N sub-levels).我需要为 MxN 大小(M 个级别,N 个子级别)创建目录和子目录结构。 Is there any Tree data structure in Python that could help me do it? Python中是否有任何Tree数据结构可以帮助我做到这一点?

Example:例子:

Input:输入:

3 x 2 ( 3 Levels and 2 sub-levels for each of 3 levels) 3 x 2(3 个级别和 3 个级别中的每个级别的 2 个子级别)

Output:输出:

1

 11
  111
  112
 12
  121
  122

------
2

 21
  211
  212
 22
  221
  222
----
3
 31
  311
  312
 32
  321
  322

There are a couple different data structures that would work for an application like this.有几种不同的数据结构适用于这样的应用程序。

My first intuition is to use nested dictionaries of matrices, as this would give you the multi-level indexing behavior that you're looking for and can be implemented in pure Python.我的第一个直觉是使用矩阵的嵌套字典,因为这将为您提供您正在寻找的多级索引行为,并且可以在纯 Python 中实现。 Since your proposed data tree is of size MxN (and is thus rectangular), you could also use a pandas.DataFrame , which supports row/column indexing similar to a nested dictionary.由于您建议的数据树大小为MxN (因此是矩形),您还可以使用pandas.DataFrame ,它支持类似于嵌套字典的行/列索引。 But ultimately, I think a numpy.ndarray is a much better fit in terms of scalability .但最终,我认为numpy.ndarray在可扩展性方面合适

Nevertheless, I'll provide an example of each.不过,我将提供每个示例。

Using Pure Python使用纯 Python

In pure Python, a MxN matrix of integers is typically represented by a list of lists of integers, the type hint for which would be list[list[int]] .在纯 Python 中,整数的MxN矩阵通常由整数列表的列表表示,其类型提示将是list[list[int]]

A data structured like level/sublevel/matrix with level/sublevel pairs like 2/21 could be represented by a structure like dict[dict[Matrix]] , which would make type hint for the complete data structure something like dict[dict[list[list[int]]]] .具有像2/21这样的level/sublevel对的级别/子level/sublevel/matrix结构的数据可以由像dict[dict[Matrix]]这样的结构表示,这将使完整数据结构的类型提示类似于dict[dict[list[list[int]]]]

The following nested dictionary comprehension will generate the proposed structure and contains the exact same data as is provided in your example case:以下嵌套字典理解将生成建议的结构并包含与您的示例案例中提供的完全相同的数据:

M = 3
N = 2

data = {
    i : {
        10 * i + j : [
            [i, j, k] for k in range(1, N + 1)
        ] for j in range(1, N + 1)
    } for i in range(1, M + 1)
}

The result of which can be seen using pprint.pprint :使用pprint.pprint可以看到其结果:

>>> from pprint import pprint
>>> pprint(data)
{1: {11: [[1, 1, 1], [1, 1, 2]], 12: [[1, 2, 1], [1, 2, 2]]},
 2: {21: [[2, 1, 1], [2, 1, 2]], 22: [[2, 2, 1], [2, 2, 2]]},
 3: {31: [[3, 1, 1], [3, 1, 2]], 32: [[3, 2, 1], [3, 2, 2]]}}

Any particular matrix can then be retrieved by its level and sublevel indices:然后可以通过其级别和子级别索引检索任何特定矩阵:

>>> data[2][21]
[[2, 1, 1], [2, 1, 2]]

Using a pandas.DataFrame使用pandas.DataFrame

If you don't mind invoking third-party libraries, you could take this a step further converting it into a pandas.DataFrame and simplifying the sublevel index:如果您不介意调用第三方库,您可以进一步将其转换为pandas.DataFrame并简化子级索引:

import pandas as pd

M = 3
N = 2

data = {
    i : {
        j : [
            [i, j, k] for k in range(1, N + 1)
        ] for j in range(1, N + 1)
    } for i in range(1, M + 1)
}

df = pd.DataFrame(data)

The result of which is the following:结果如下:

>>> df
                        1                       2                       3
1  [[1, 1, 1], [1, 1, 2]]  [[2, 1, 1], [2, 1, 2]]  [[3, 1, 1], [3, 1, 2]]
2  [[1, 2, 1], [1, 2, 2]]  [[2, 2, 1], [2, 2, 2]]  [[3, 2, 1], [3, 2, 2]]

Which, with the simplified sublevel index, gives its element matrices like so:其中,使用简化的子级索引,它的元素矩阵如下:

>>> df[2][1]  # Equivalent to data[2][21] in the pure Python example.
[[2, 1, 1], [2, 1, 2]] 

Using a numpy.ndarray使用numpy.ndarray

At this point, you might note that the data structure in question is in fact just an MxN matrix of MxN matrices.此时,您可能会注意到所讨论的数据结构实际上只是MxN矩阵的MxN矩阵。 So if you wanted to, you could reduce this into an MxNxMxN 4D array by switching from dictionary comprehensions to list comprehensions and invoking numpy :因此,如果您愿意,可以通过从字典推导切换到列表推导并调用numpy将其减少为MxNxMxN 4D 数组:

import numpy as np

M = 3
N = 2

data = [
    [
        [
            [i, j, k] for k in range(1, N + 1)
        ] for j in range(1, N + 1)
    ] for i in range(1, M + 1)
]

data = np.array(data)

Which, in this example, results in the following array of shape (3, 2, 3, 2) :在此示例中,会产生以下形状数组(3, 2, 3, 2)

>>> data
array([[[[1, 1, 1],
         [1, 1, 2]],

        [[1, 2, 1],
         [1, 2, 2]]],


       [[[2, 1, 1],
         [2, 1, 2]],

        [[2, 2, 1],
         [2, 2, 2]]],


       [[[3, 1, 1],
         [3, 1, 2]],

        [[3, 2, 1],
         [3, 2, 2]]]])

For which indexing is off-by-one relative to the pandas.DataFrame case, as array indices start from zero:对于哪个索引相对于pandas.DataFrame的情况来说是一倍的,因为数组索引从零开始:

>>> data[1][0]  # Equivalent to df[2][1] in the pandas example.
array([[2, 1, 1],
       [2, 1, 2]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM