简体   繁体   English

如何从 Python 中的分层数据创建树结构?

[英]How to create tree structure from hierarchical data in Python?

Hi I am a bit new to Python and am a bit confused how to proceed.嗨,我对 Python 有点陌生,对如何进行有点困惑。 I have a large dataset that contains both parent and child information.我有一个包含父子信息的大型数据集。 For example, if we have various items and their components, and their components also have other components or children, how do we create a type of tree structure?例如,如果我们有各种项目及其组件,而它们的组件也有其他组件或子组件,我们如何创建一种树结构? Here is an example of the data:下面是一个数据示例: 在此处输入图片说明

I was wondering how I can get it into a tree structure.我想知道如何将它变成树状结构。 So the output would be:所以输出将是:

Tree structure for car汽车树结构

and it will also return the one for airplane, similar to the one for car.并且它也会返回一个用于飞机的,类似于用于汽车的一个。

I know that the common attribute for this would be based upon the parent number/child number.我知道这的共同属性将基于父编号/子编号。 But, I am a bit confused on how to go about this in python.但是,我对如何在 python 中解决这个问题有点困惑。

Use a class to encode the structure:使用类对结构进行编码:

class TreeNode:
    def __init__(self, number, name):
        self.number = number
        self.name = name
        self.children = []
    
    def addChild(self, child):
        self.children.append(child)

One example of how to use it:如何使用它的一个例子:

car = TreeNode(1111, "car")
engine = TreeNode(3333, "engine")
car.addChild(engine)

Note: The number attribute doesn't have to be an int (eg 1111 for car);注意: number属性不必是整数(例如,汽车为1111 ); it can just as well be a string of the integer (ie "1111" ).它也可以是一个整数字符串(即"1111" )。


To actually get something resembling your desired output, we'll need to serialize the root object into nested dictionaries:要真正获得类似于您想要的输出的内容,我们需要将根对象序列化为嵌套字典:

class TreeNode:
    def __init__(self, number, name):
        self.number = number
        self.name = name
        self.children = []
    
    def addChild(self, child):
        self.children.append(child)
    
    def serialize(self):
        s = {}
        for child in self.children:
            s[child.name] = child.serialize()
        return s

Now, we can get something resembling your desired output by using json.dumps :现在,我们可以使用json.dumps获得类似于您想要的输出的json.dumps

dummy = TreeNode(None, None) # think of this as the root/table

car = TreeNode(1111, "car")
dummy.addChild(car)

engine = TreeNode(3333, "engine")
car.addChild(engine)

fan = TreeNode(4312, "fan")
engine.addChild(fan)

print(json.dumps(dummy.serialize(), indent=4))

prints:印刷:

{
    "car": {
        "engine": {
            "fan": {}
        }
    }
}

Problems like this always come down to algorithm and data set.像这样的问题总是归结为算法和数据集。 The first thing I notice about your data set is that it is ordered such that no child is ever the parent of a previous parent.我注意到你的数据集的第一件事是它的排序是这样的,没有孩子是前一个父母的父母。 In other words, the items are listed in a "top-down" fashion.换句话说,项目以“自上而下”的方式列出。 Is this always to be true?这总是正确的吗? If it is, it means the logic of the algorithm becomes much simpler.如果是,则意味着算法的逻辑变得简单得多。

Another consideration is data structure.另一个考虑因素是数据结构。 I would use nested dictionaries to hold the main data set here.我会使用嵌套字典来保存这里的主要数据集。 Each new unique parent will be a "key" of the main dict.每个新的唯一父项将是主字典的“键”。 And each "value" corresponding to that key will be a dict, and the nesting can continue on and on as needed.与该键对应的每个“值”将是一个 dict,并且可以根据需要继续嵌套。 In this case, there will only be few levels of nesting.在这种情况下,只会有很少的嵌套级别。

So, for each line in the data set, you would check if the Parent appears as a key in the top dict or any of the nested dicts.因此,对于数据集中的每一行,您将检查 Parent 是否显示为顶部 dict 或任何嵌套 dict 中的键。 If it does not, you'll create a new entry in the top level dict with Parent as the key, and {Child:{}} as the new entry's value.如果没有,您将在顶级字典中创建一个新条目,以 Parent 作为键,{Child:{}} 作为新条目的值。 (This will happen for "car" and "airplane".) (这将发生在“汽车”和“飞机”上。)

If the current Parent DOES appear as a key in any of the dicts, you need to add the Child value as a new key to the dict that is the value of the dict which has Parent as a key.如果当前 Parent 确实在任何 dicts 中显示为键,则需要将 Child 值作为新键添加到 dict 中,该 dict 是 Parent 作为键的 dict 的值。 In that instance, Child is the new key, and the value for that key is the empty dict {}.在这种情况下,Child 是新键,该键的值是空字典 {}。

The above is the rough logic I would use to write the code.以上是我用来编写代码的粗略逻辑。 I leave that part to you.我把那部分留给你。 There may be third party libraries you can use to make the effort much less, but if you're taking a class and this is an assignment, your teacher may not want you to use such external libraries.您可能可以使用第三方库来减少工作量,但如果您正在上课并且这是一项作业,您的老师可能不希望您使用此类外部库。

Note that the above logic assumes that the data set is organized as "top-down."请注意,上述逻辑假设数据集是按“自上而下”的方式组织的。 If that is not the case, then the logic becomes more complicated, and a key that is currently at a certain level in the hierarchy might be bumped down the hierarchy if a new Parent to that Child is processed in the data set.如果不是这种情况,那么逻辑会变得更加复杂,并且如果在数据集中处理该子级的新父级,则当前处于层次结构中某个级别的键可能会在层次结构中向下移动。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM