简体   繁体   English

keras / tensorflow中的树结构输入

[英]Tree structured input in keras/tensorflow

For some school project I am trying to implement a tree convolution as described in "Convolutional Neural Networks over Tree Structures for Programming Language Processing" Lili Mou, et al.对于一些学校项目,我正在尝试实现树卷积,如“用于编程语言处理的树结构上的卷积神经网络”Lili Mou 等人中所述。

Goal目标

Basically, the outcome should be a neural network.基本上,结果应该是一个神经网络。 The samples to this network are binary trees whose nodes have a fixed length features such as 1xN .该网络的样本是二叉树,其节点具有固定长度的特征,例如1xN The challenging part for me has been the freedom in the tree shape.对我来说具有挑战性的部分是树形的自由。 This means a sample tree may have any number of nodes in any shape.这意味着样本树可以有任意数量的任意形状的节点。 A left-deep tree, right-deep tree, complete tree are all possible.左深树、右深树、完全树都是可能的。 The only constraint is they all should be binary trees.唯一的限制是它们都应该是二叉树。

The tree convolution on a sample tree is defined with 3 weight matrices W_p, W_l, W_r .样本树上的树卷积由 3 个权重矩阵W_p, W_l, W_r These weights are used for each node in the tree to generate another tree of the same shape but with different features such as 1xM if the weights are of shape NxM .这些权重用于树中的每个节点,以生成另一棵形状相同但具有不同特征的树,如果权重的形状为NxM1xM For each node its feature gets multiplied by W_p and its children by W_l, W_r so the node in the new tree will contain information about itself and both its children.对于每个节点,其特征乘以W_p ,其子节点乘以W_l, W_r因此新树中的节点将包含有关其自身及其子节点的信息。

Then there comes finally a dynamic pooling layer over all the tree nodes to have a 1xM flattened vector in the end so that it could be fed into a Dense Layer for example.最后是所有树节点上的动态池化层,最后有一个1xM的扁平化向量,以便可以将其馈入例如密集层。 The way it works is they call each entry of 1xM vectors a channel.它的工作方式是将1xM向量的每个条目称为一个通道。 Then for each channel the maximum value over all nodes is returned to have a 1xM vector.然后对于每个通道,返回所有节点的最大值以具有1xM向量。

Problem问题

This was a quick explanation of the paper.这是对论文的快速解释。 Now the problem as I said in the first paragraph is the varying number of children of these binary trees.现在,正如我在第一段中所说的那样,问题是这些二叉树的子节点数量不同。 First I tried to use Keras, but obviously it needs fixed-size input for Layers.首先我尝试使用 Keras,但显然它需要固定大小的层输入。 Then it occured to me I can use array implementation of binary trees to encode each tree in a fixed-size fashion.然后我想到我可以使用二叉树的数组实现以固定大小的方式对每棵树进行编码。 This means for example a parent at node i would have its children at 2*i and 2*i+1 .这意味着例如节点i的父节点将在2*i2*i+1拥有其子节点。 Whenever there are not children in some places, put N zeros for padding if the features are of length N .每当某些地方没有孩子时,如果特征的长度为N ,则将N个零用于填充。

This required me to have information about the maximum index over all trees such that I can create some AxN array where A is the maximum indexing used in this fixed-size schema.这需要我了解所有树的最大索引信息,以便我可以创建一些AxN数组,其中A是此固定大小模式中使用的最大索引。 Sadly, the input trees may be really deep with fewer nodes so to encode 16 nodes I have to create a 60000xN or 6000xN array most of which gets zero padded just because the tree is not well-balanced.可悲的是,输入树可能非常深,节点较少,因此要编码 16 个节点,我必须创建一个60000xN6000xN数组,其中大部分被零填充只是因为树不平衡。

Then I switched to a custom SGD implementation where I defined Dense, Tree Convolution, Dynamic Pooling quickly.然后我切换到自定义 SGD 实现,在其中我快速定义了密集、树卷积、动态池。 The forward pass was really easy.向前传球真的很容易。 In the backprop, however, I got it to the point I can propagate derivatives from Dense to the Pooling to the tree before the pooling and do a weight update in that tree, but not for the before trees.然而,在反向传播中,我可以将导数从 Dense 传播到 Pooling 到池化之前的树,并在该树中进行权重更新,但不能用于之前的树。 Since Keras/TF handles differentiation in the background it was easier indeed.由于 Keras/TF 在后台处理差异化,确实更容易。

Now I feel really stuck between choosing approaches for this problem.现在,我真的在为这个问题选择方法之间陷入困境。 Obviously Keras/TF has lots of functionality available for designing such a network.显然 Keras/TF 有很多功能可用于设计这样的网络。 Should there be an efficient way of passing this tree structured data to these libraries so for 30 nodes I do not end up creating 60000 nodes with 59970 zero vectors?是否应该有一种有效的方法将此树结构数据传递给这些库,以便对于 30 个节点,我最终不会创建 60000 个节点和 59970 个零向量? The idea of generating 6000 or 60000 nodes for some 15 nodes is just crazy at this point even if you got the best GPU out there.即使您有最好的 GPU,在这一点上为大约 15 个节点生成 6000 或 60000 个节点的想法也是疯狂的。

Or should I work on deriving the derivative equations on the paper to continue the custom SGD implementation?或者我应该努力推导论文上的导数方程以继续自定义 SGD 实现?

For reference, this was how it looked like with Keras, with the inefficient encoding of the trees I mentioned above.作为参考,这就是 Keras 的样子,上面提到的树的编码效率低下。

class MyLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer
        self.kernel = self.add_weight(name='kernel',
                                      shape=(3, input_shape[2], self.output_dim[1]),
                                      initializer='ones',
                                      trainable=True)
        super(MyLayer, self).build(input_shape)  # Be sure to call this at the end

    def call(self, x):
        _, tree_size, feature_size = K.int_shape(x)

        new_tree = []
        for i in range(tree_size // 2):
            parent = tf.gather_nd(x, (0,i))
            left = tf.gather_nd(x, (0, 2*i + 1) )
            right = tf.gather_nd(x, (0, 2*i + 2))
            p_l_r = K.expand_dims(K.stack([parent, left, right]), axis = 1)
            product = K.sum(K.batch_dot(p_l_r, self.kernel), axis = 0)
            new_tree.append(product)
        for j in range (tree_size //2, tree_size):
            parent = tf.gather_nd(x, (0, j))
            parent = K.expand_dims(parent, axis = 0)
            product = K.dot(parent, self.kernel[0])
            new_tree.append(product)

        new_tree = K.stack(new_tree, axis = 1)
        return new_tree
    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim[0], self.output_dim[1])

Tensorflow used to have a decision tree implementation. Tensorflow 曾经有一个决策树实现。 You can see the data structures (variables) it used here: https://github.com/tensorflow/tensorflow/blob/v0.10.0rc0/tensorflow/contrib/tensor_forest/python/tensor_forest.py#L155你可以在这里看到它使用的数据结构(变量): https://github.com/tensorflow/tensorflow/blob/v0.10.0rc0/tensorflow/contrib/tensor_forest/python/tensor_forest.py#L155

It shows that you can implement a tree by creating a 2D tensor of shape (max_nodes, max_children) .它表明您可以通过创建形状为(max_nodes, max_children)的 2D 张量来实现树。 The (i, j) entry has an integer telling the index of the jth child of the ith node in the same tensor. (i, j)条目有一个 integer 告诉同一张量中第 i 个节点的第 j 个子节点的索引。 So an upside-down-v shaped binary tree with three nodes would be [[1, 2], [-1, -1], [-1, -1]] .因此,具有三个节点的倒置 V 形二叉树将是[[1, 2], [-1, -1], [-1, -1]]

You could easily create a second tensor to hold the features, where the ith row hold the features for the ith node.您可以轻松创建第二个张量来保存特征,其中第 i 行保存第 i 个节点的特征。 Then it would be possible to perform the convolution operation you mentioned, although it would require looping.然后可以执行您提到的卷积操作,尽管它需要循环。 I don't see a way to vectorize it, but that's the cost of using a (somewhat) sparse representation.我看不到矢量化它的方法,但这是使用(有点)稀疏表示的成本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM