[英]How to run a .py module?

I've got zero experience with Python. 我没有Python经验。 I have looked around some tutorial materials, but it seems difficult to understand a advanced code. 我查看了一些教程资料,但似乎很难理解高级代码。 So I came here for a more specific answer. 所以我来这里寻求更具体的答案。 For me the mission is to redo the code in my computer. 对我来说,任务是重做计算机中的代码。

Here is the scenario: 这是场景:

I'm a graduate student studying tensor factorization in relation learning. 我是一名研究关系学习中的张量因子分解的研究生。 A paper[1] providing a code to run this algorithm, as follows: 论文[1]提供了运行该算法的代码,如下:

import logging, time
from numpy import dot, zeros, kron, array, eye, argmax
from numpy.linalg import qr, pinv, norm, inv 
from scipy.linalg import eigh
from numpy.random import rand

__version__ = "0.1" 
__all__ = ['rescal', 'rescal_with_random_restarts']

__DEF_INIT = 'nvecs'
__DEF_PROJ = True
__DEF_CONV = 1e-5

_log = logging.getLogger('RESCAL') 

def rescal_with_random_restarts(X, rank, restarts=10, **kwargs):
    Restarts RESCAL multiple time from random starting point and 
    returns factorization with best fit.
    models = []
    fits = []
    for i in range(restarts):
        res = rescal(X, rank, init='random', **kwargs)
    return models[argmax(fits)]

def rescal(X, rank, **kwargs):

    Factors a three-way tensor X such that each frontal slice 
    X_k = A * R_k * A.T. The frontal slices of a tensor are 
    N x N matrices that correspond to the adjecency matrices 
    of the relational graph for a particular relation.

    For a full description of the algorithm see: 
      Maximilian Nickel, Volker Tresp, Hans-Peter-Kriegel, 
      "A Three-Way Model for Collective Learning on Multi-Relational Data",
      ICML 2011, Bellevue, WA, USA

    X : list
        List of frontal slices X_k of the tensor X. The shape of each X_k is ('N', 'N')
    rank : int 
        Rank of the factorization
    lmbda : float, optional 
        Regularization parameter for A and R_k factor matrices. 0 by default 
    init : string, optional
        Initialization method of the factor matrices. 'nvecs' (default) 
        initializes A based on the eigenvectors of X. 'random' initializes 
        the factor matrices randomly.
    proj : boolean, optional 
        Whether or not to use the QR decomposition when computing R_k.
        True by default 
    maxIter : int, optional 
        Maximium number of iterations of the ALS algorithm. 500 by default. 
    conv : float, optional 
        Stop when residual of factorization is less than conv. 1e-5 by default

    A : ndarray 
        array of shape ('N', 'rank') corresponding to the factor matrix A
    R : list
        list of 'M' arrays of shape ('rank', 'rank') corresponding to the factor matrices R_k 
    f : float 
        function value of the factorization 
    iter : int 
        number of iterations until convergence 
    exectimes : ndarray 
        execution times to compute the updates in each iteration

    # init options
    ainit = kwargs.pop('init', __DEF_INIT)
    proj = kwargs.pop('proj', __DEF_PROJ)
    maxIter = kwargs.pop('maxIter', __DEF_MAXITER)
    conv = kwargs.pop('conv', __DEF_CONV)
    lmbda = kwargs.pop('lmbda', __DEF_LMBDA)
    if not len(kwargs) == 0:
        raise ValueError( 'Unknown keywords (%s)' % (kwargs.keys()) )

    sz = X[0].shape
    dtype = X[0].dtype 
    n = sz[0]
    k = len(X) 

    _log.debug('[Config] rank: %d | maxIter: %d | conv: %7.1e | lmbda: %7.1e' % (rank, 
        maxIter, conv, lmbda))
    _log.debug('[Config] dtype: %s' % dtype)

    # precompute norms of X 
    normX = [norm(M)**2 for M in X]
    Xflat = [M.flatten() for M in X]
    sumNormX = sum(normX)

    # initialize A
    if ainit == 'random':
        A = array(rand(n, rank), dtype=dtype)
    elif ainit == 'nvecs':
        S = zeros((n, n), dtype=dtype)
        T = zeros((n, n), dtype=dtype)
        for i in range(k):
            T = X[i]
            S = S + T + T.T
        evals, A = eigh(S,eigvals=(n-rank,n-1))
    else :
        raise 'Unknown init option ("%s")' % ainit

    # initialize R
    if proj:
        Q, A2 = qr(A)
        X2 = __projectSlices(X, Q)
        R = __updateR(X2, A2, lmbda)
    else :
        R = __updateR(X, A, lmbda)

    # compute factorization
    fit = fitchange = fitold = f = 0
    exectimes = []
    ARAt = zeros((n,n), dtype=dtype)
    for iter in xrange(maxIter):
        tic = time.clock()
        fitold = fit
        A = __updateA(X, A, R, lmbda)
        if proj:
            Q, A2 = qr(A)
            X2 = __projectSlices(X, Q)
            R = __updateR(X2, A2, lmbda)
        else :
            R = __updateR(X, A, lmbda)

        # compute fit value
        f = lmbda*(norm(A)**2)
        for i in range(k):
            ARAt = dot(A, dot(R[i], A.T))
            f += normX[i] + norm(ARAt)**2 - 2*dot(Xflat[i], ARAt.flatten()) + lmbda*(R[i].flatten()**2).sum()
        f *= 0.5

        fit = 1 - f / sumNormX
        fitchange = abs(fitold - fit)

        toc = time.clock()
        exectimes.append( toc - tic )
        _log.debug('[%3d] fit: %.5f | delta: %7.1e | secs: %.5f' % (iter, 
            fit, fitchange, exectimes[-1]))
        if iter > 1 and fitchange < conv:
    return A, R, f, iter+1, array(exectimes)

def __updateA(X, A, R, lmbda):
    n, rank = A.shape
    F = zeros((n, rank), dtype=X[0].dtype)
    E = zeros((rank, rank), dtype=X[0].dtype)

    AtA = dot(A.T,A)
    for i in range(len(X)):
        F += dot(X[i], dot(A, R[i].T)) + dot(X[i].T, dot(A, R[i]))
        E += dot(R[i], dot(AtA, R[i].T)) + dot(R[i].T, dot(AtA, R[i]))
    A = dot(F, inv(lmbda * eye(rank) + E))
    return A

def __updateR(X, A, lmbda):
    r = A.shape[1]
    R = []
    At = A.T    
    if lmbda == 0:
        ainv = dot(pinv(dot(At, A)), At)
        for i in range(len(X)):
            R.append( dot(ainv, dot(X[i], ainv.T)) )
    else :
        AtA = dot(At, A)
        tmp = inv(kron(AtA, AtA) + lmbda * eye(r**2))
        for i in range(len(X)):
            AtXA = dot(At, dot(X[i], A)) 
            R.append( dot(AtXA.flatten(), tmp).reshape(r, r) )
    return R

def __projectSlices(X, Q):
    q = Q.shape[1]
    X2 = []
    for i in range(len(X)):
        X2.append( dot(Q.T, dot(X[i], Q)) )
    return X2

It's boring to paste such a long code but there is no other way to figure out my problems. 粘贴如此长的代码很无聊,但没有其他方法可以解决我的问题。 I'm sorry about this. 对此我很抱歉。

I import this module and pass them arguments according to the author's website : 我导入这个模块并根据作者的网站传递它们的参数:

import pickle, sys
from rescal import rescal

rank = sys.argv[1]
X = pickle.load('us-presidents.pickle')
A, R, f, iter, exectimes = rescal(X, rank, lmbda=1.0)

The dataset us-presidents.rdf can be found here . 数据集us-presidents.rdf可以在这里找到。

My questions are: 我的问题是:

  1. According to the code note, the tensor X is a list. 根据代码注释,张量X是一个列表。 I don't quite understand this, how do I relate a list to a tensor in Python? 我不太明白这一点,如何将列表与Python中的张量相关联? Can I understand tensor = list in Python? 我能理解Python中的tensor = list吗?
  2. Should I convert RDF format to a triple(subject, predicate, object) format first? 我应该首先将RDF格式转换为三重(主题,谓词,对象)格式吗? I'm not sure of the data structure of X. How do I assignment values to X by hand? 我不确定X的数据结构。如何手动将值赋值给X?
  3. Then, how to run it? 然后,如何运行它?

I paste the author's code without his authorization, is it an act of infringement? 我未经他的授权粘贴作者的代码,这是侵权行为吗? if so, I am so sorry and I will delete it soon. 如果是的话,我很抱歉,我很快就会删除它。

The problems may be a little bored, but these are important to me. 问题可能有点无聊,但这对我来说很重要。 Any help would be greatly appreciated. 任何帮助将不胜感激。

[1] Maximilian Nickel, Volker Tresp, Hans-Peter Kriegel, A Three-Way Model for Collective Learning on Multi-Relational Data, in Proceedings of the 28th International Conference on Machine Learning, 2011 , Bellevue, WA, USA [1] Maximilian Nickel,Volker Tresp,Hans-Peter Kriegel,一种关于多关系数据集体学习的三向模型,载于第28届国际机器学习会议论文集,2011年,美国华盛顿州贝尔维尤市

To answer Q2: you need to transform the RDF and save it before you can load it from the file 'us-presidents.pickle'. 要回答Q2:您需要转换RDF并保存它,然后才能从文件'us-presidents.pickle'加载它。 The author of that code probably did that once because the Python native pickle format loads faster. 该代码的作者可能曾经这样做过一次因为Python原生pickle格式加载速度更快。 As the pickle format includes the datatype of the data, it is possible that X is some numpy class instance and you would need either an example pickle file as used by this code, or some code doing the pickle.dump to figure out how to convert from RDF to this particular pickle file as rescal expects it. 由于pickle格式包含数据的数据类型,因此X可能是一些numpy类实例,您可能需要此代码使用的示例pickle文件,或者某些代码执行pickle.dump以找出如何转换从rescal期望它,从RDF到这个特定的pickle文件。

So this might answer Q1: the tensor consists of a list of elements. 所以这可能会回答Q1:张量由一系列元素组成。 From the code you can see that the X parameter to rescal has a length ( k = len(X) ) and can be indexed ( T = X[i] ). 从代码中可以看出,rescal的X参数有一个长度( k = len(X) )并且可以被索引( T = X[i] )。 So it elements are used as a list (even if it might be some other datatype, that just behaves as such. 所以它的元素被用作一个列表(即使它可能是一些其他数据类型,它只是表现如此。

As an aside: If you are not familiar with Python and are just interested in the result of the computation, you might get more help contacting the author of the software. 顺便说一句:如果您不熟悉Python并且只对计算结果感兴趣,那么您可能会获得更多帮助来联系软件作者。

  1. According to the code note, the tensor X is a list. 根据代码注释,张量X是一个列表。 I don't quite understand this, how do I relate a list to a tensor in Python? 我不太明白这一点,如何将列表与Python中的张量相关联? Can I understand tensor = list in Python? 我能理解Python中的tensor = list吗?

Not necessarily but the author of the code has decided to represent the tensor data as a list data structure. 不一定,但代码的作者已决定将张量数据表示为列表数据结构。 As the comments indicate, the list X contains: 如评论所示,列表X包含:

List of frontal slices X_k of the tensor X. The shape of each X_k is ('N', 'N') 张量X的正面切片X_k列表。每个X_k的形状是('N','N')

That means the tensor is repesented as a list of tuples: [(N, N), ..., (N, N)] . 这意味着张量被重复为元组列表: [(N, N), ..., (N, N)]

  1. I'm not sure of the data structure of X. How do I assignment values to X by hand? 我不确定X的数据结构。如何手动将值赋值给X?

Now that we now the data structure of X, we can assign values to it using assignment. 现在我们现在是X的数据结构,我们可以使用赋值为它赋值。 The following will assign the tuple (1, 3) to the first position in the list X (as the first position is at index 0, the second at position 1, et cetera): 以下将元组(1, 3)分配给列表X中的第一个位置(因为第一个位置在索引0处,第二个位置在位置1处,等等):

X[0] = (1, 3)

Similarly, the following will assign the tuple (2, 4) to the second position: 类似地,以下将元组(2, 4)分配给第二个位置:

X[1] = (2, 4)

