简体   繁体   中英

Generate hierarchical data using python

I need to create Directory and sub-directory structure for a MxN size ( M levels , N sub-levels). Is there any Tree data structure in Python that could help me do it?

Example:

Input:

3 x 2 ( 3 Levels and 2 sub-levels for each of 3 levels)

Output:

1

 11
  111
  112
 12
  121
  122

------
2

 21
  211
  212
 22
  221
  222
----
3
 31
  311
  312
 32
  321
  322

There are a couple different data structures that would work for an application like this.

My first intuition is to use nested dictionaries of matrices, as this would give you the multi-level indexing behavior that you're looking for and can be implemented in pure Python. Since your proposed data tree is of size MxN (and is thus rectangular), you could also use a pandas.DataFrame , which supports row/column indexing similar to a nested dictionary. But ultimately, I think a numpy.ndarray is a much better fit in terms of scalability .

Nevertheless, I'll provide an example of each.

Using Pure Python

In pure Python, a MxN matrix of integers is typically represented by a list of lists of integers, the type hint for which would be list[list[int]] .

A data structured like level/sublevel/matrix with level/sublevel pairs like 2/21 could be represented by a structure like dict[dict[Matrix]] , which would make type hint for the complete data structure something like dict[dict[list[list[int]]]] .

The following nested dictionary comprehension will generate the proposed structure and contains the exact same data as is provided in your example case:

M = 3
N = 2

data = {
    i : {
        10 * i + j : [
            [i, j, k] for k in range(1, N + 1)
        ] for j in range(1, N + 1)
    } for i in range(1, M + 1)
}

The result of which can be seen using pprint.pprint :

>>> from pprint import pprint
>>> pprint(data)
{1: {11: [[1, 1, 1], [1, 1, 2]], 12: [[1, 2, 1], [1, 2, 2]]},
 2: {21: [[2, 1, 1], [2, 1, 2]], 22: [[2, 2, 1], [2, 2, 2]]},
 3: {31: [[3, 1, 1], [3, 1, 2]], 32: [[3, 2, 1], [3, 2, 2]]}}

Any particular matrix can then be retrieved by its level and sublevel indices:

>>> data[2][21]
[[2, 1, 1], [2, 1, 2]]

Using a pandas.DataFrame

If you don't mind invoking third-party libraries, you could take this a step further converting it into a pandas.DataFrame and simplifying the sublevel index:

import pandas as pd

M = 3
N = 2

data = {
    i : {
        j : [
            [i, j, k] for k in range(1, N + 1)
        ] for j in range(1, N + 1)
    } for i in range(1, M + 1)
}

df = pd.DataFrame(data)

The result of which is the following:

>>> df
                        1                       2                       3
1  [[1, 1, 1], [1, 1, 2]]  [[2, 1, 1], [2, 1, 2]]  [[3, 1, 1], [3, 1, 2]]
2  [[1, 2, 1], [1, 2, 2]]  [[2, 2, 1], [2, 2, 2]]  [[3, 2, 1], [3, 2, 2]]

Which, with the simplified sublevel index, gives its element matrices like so:

>>> df[2][1]  # Equivalent to data[2][21] in the pure Python example.
[[2, 1, 1], [2, 1, 2]] 

Using a numpy.ndarray

At this point, you might note that the data structure in question is in fact just an MxN matrix of MxN matrices. So if you wanted to, you could reduce this into an MxNxMxN 4D array by switching from dictionary comprehensions to list comprehensions and invoking numpy :

import numpy as np

M = 3
N = 2

data = [
    [
        [
            [i, j, k] for k in range(1, N + 1)
        ] for j in range(1, N + 1)
    ] for i in range(1, M + 1)
]

data = np.array(data)

Which, in this example, results in the following array of shape (3, 2, 3, 2) :

>>> data
array([[[[1, 1, 1],
         [1, 1, 2]],

        [[1, 2, 1],
         [1, 2, 2]]],


       [[[2, 1, 1],
         [2, 1, 2]],

        [[2, 2, 1],
         [2, 2, 2]]],


       [[[3, 1, 1],
         [3, 1, 2]],

        [[3, 2, 1],
         [3, 2, 2]]]])

For which indexing is off-by-one relative to the pandas.DataFrame case, as array indices start from zero:

>>> data[1][0]  # Equivalent to df[2][1] in the pandas example.
array([[2, 1, 1],
       [2, 1, 2]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM