[英]Generate hierarchical data using python
I need to create Directory and sub-directory structure for a MxN size ( M levels , N sub-levels).我需要为 MxN 大小(M 个级别,N 个子级别)创建目录和子目录结构。 Is there any Tree data structure in Python that could help me do it?
Python中是否有任何Tree数据结构可以帮助我做到这一点?
Example:例子:
Input:输入:
3 x 2 ( 3 Levels and 2 sub-levels for each of 3 levels) 3 x 2(3 个级别和 3 个级别中的每个级别的 2 个子级别)
Output:输出:
1
11
111
112
12
121
122
------
2
21
211
212
22
221
222
----
3
31
311
312
32
321
322
There are a couple different data structures that would work for an application like this.有几种不同的数据结构适用于这样的应用程序。
My first intuition is to use nested dictionaries of matrices, as this would give you the multi-level indexing behavior that you're looking for and can be implemented in pure Python.我的第一个直觉是使用矩阵的嵌套字典,因为这将为您提供您正在寻找的多级索引行为,并且可以在纯 Python 中实现。 Since your proposed data tree is of size MxN (and is thus rectangular), you could also use a
pandas.DataFrame
, which supports row/column indexing similar to a nested dictionary.由于您建议的数据树大小为MxN (因此是矩形),您还可以使用
pandas.DataFrame
,它支持类似于嵌套字典的行/列索引。 But ultimately, I think a numpy.ndarray
is a much better fit in terms of scalability .但最终,我认为
numpy.ndarray
在可扩展性方面更合适。
Nevertheless, I'll provide an example of each.不过,我将提供每个示例。
In pure Python, a MxN matrix of integers is typically represented by a list of lists of integers, the type hint for which would be list[list[int]]
.在纯 Python 中,整数的MxN矩阵通常由整数列表的列表表示,其类型提示将是
list[list[int]]
。
A data structured like level/sublevel/matrix
with level/sublevel
pairs like 2/21
could be represented by a structure like dict[dict[Matrix]]
, which would make type hint for the complete data structure something like dict[dict[list[list[int]]]]
.具有像
2/21
这样的level/sublevel
对的级别/子level/sublevel/matrix
结构的数据可以由像dict[dict[Matrix]]
这样的结构表示,这将使完整数据结构的类型提示类似于dict[dict[list[list[int]]]]
。
The following nested dictionary comprehension will generate the proposed structure and contains the exact same data as is provided in your example case:以下嵌套字典理解将生成建议的结构并包含与您的示例案例中提供的完全相同的数据:
M = 3
N = 2
data = {
i : {
10 * i + j : [
[i, j, k] for k in range(1, N + 1)
] for j in range(1, N + 1)
} for i in range(1, M + 1)
}
The result of which can be seen using pprint.pprint
:使用
pprint.pprint
可以看到其结果:
>>> from pprint import pprint
>>> pprint(data)
{1: {11: [[1, 1, 1], [1, 1, 2]], 12: [[1, 2, 1], [1, 2, 2]]},
2: {21: [[2, 1, 1], [2, 1, 2]], 22: [[2, 2, 1], [2, 2, 2]]},
3: {31: [[3, 1, 1], [3, 1, 2]], 32: [[3, 2, 1], [3, 2, 2]]}}
Any particular matrix can then be retrieved by its level and sublevel indices:然后可以通过其级别和子级别索引检索任何特定矩阵:
>>> data[2][21]
[[2, 1, 1], [2, 1, 2]]
pandas.DataFrame
pandas.DataFrame
If you don't mind invoking third-party libraries, you could take this a step further converting it into a pandas.DataFrame
and simplifying the sublevel index:如果您不介意调用第三方库,您可以进一步将其转换为
pandas.DataFrame
并简化子级索引:
import pandas as pd
M = 3
N = 2
data = {
i : {
j : [
[i, j, k] for k in range(1, N + 1)
] for j in range(1, N + 1)
} for i in range(1, M + 1)
}
df = pd.DataFrame(data)
The result of which is the following:结果如下:
>>> df
1 2 3
1 [[1, 1, 1], [1, 1, 2]] [[2, 1, 1], [2, 1, 2]] [[3, 1, 1], [3, 1, 2]]
2 [[1, 2, 1], [1, 2, 2]] [[2, 2, 1], [2, 2, 2]] [[3, 2, 1], [3, 2, 2]]
Which, with the simplified sublevel index, gives its element matrices like so:其中,使用简化的子级索引,它的元素矩阵如下:
>>> df[2][1] # Equivalent to data[2][21] in the pure Python example.
[[2, 1, 1], [2, 1, 2]]
numpy.ndarray
numpy.ndarray
At this point, you might note that the data structure in question is in fact just an MxN matrix of MxN matrices.此时,您可能会注意到所讨论的数据结构实际上只是MxN矩阵的MxN矩阵。 So if you wanted to, you could reduce this into an MxNxMxN 4D array by switching from dictionary comprehensions to list comprehensions and invoking
numpy
:因此,如果您愿意,可以通过从字典推导切换到列表推导并调用
numpy
将其减少为MxNxMxN 4D 数组:
import numpy as np
M = 3
N = 2
data = [
[
[
[i, j, k] for k in range(1, N + 1)
] for j in range(1, N + 1)
] for i in range(1, M + 1)
]
data = np.array(data)
Which, in this example, results in the following array of shape (3, 2, 3, 2)
:在此示例中,会产生以下形状数组
(3, 2, 3, 2)
:
>>> data
array([[[[1, 1, 1],
[1, 1, 2]],
[[1, 2, 1],
[1, 2, 2]]],
[[[2, 1, 1],
[2, 1, 2]],
[[2, 2, 1],
[2, 2, 2]]],
[[[3, 1, 1],
[3, 1, 2]],
[[3, 2, 1],
[3, 2, 2]]]])
For which indexing is off-by-one relative to the pandas.DataFrame
case, as array indices start from zero:对于哪个索引相对于
pandas.DataFrame
的情况来说是一倍的,因为数组索引从零开始:
>>> data[1][0] # Equivalent to df[2][1] in the pandas example.
array([[2, 1, 1],
[2, 1, 2]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.