Python relative vs absolute imports in scientific stack repos

Question

I'm looking at the following packages (numpy, scipy, scikit-learn, scikit-image) for inspiration on how to structure my own software. Something that seems to be standard across the packages I listed are the following conventions:

In all cases, imports appear at the top of each file
In "module code", all "internal package" imports are done using relative imports
In "module code", all "external package" imports are done using absolute imports
In "test code", only absolute imports are used

Could someone explain why these rules are being used, or point me to some references? In all of their coding convention guides they state to follow this standard, but I don't see an explanation on why. I'd love to know, so any help is much appreciated!

Here are a few code examples to help illustrate...

"Module code example" (sklearn/decomposition/pca.py)

from math import log, sqrt
import numbers

import numpy as np
from scipy import linalg
from scipy.special import gammaln
from scipy.sparse import issparse
from scipy.sparse.linalg import svds

from ..externals import six

from .base import _BasePCA
from ..base import BaseEstimator, TransformerMixin
from ..utils import deprecated
from ..utils import check_random_state, as_float_array
from ..utils import check_array
from ..utils.extmath import fast_logdet, randomized_svd, svd_flip
from ..utils.extmath import stable_cumsum
from ..utils.validation import check_is_fitted


def _assess_dimension_(spectrum, rank, n_samples, n_features):
    """Compute the likelihood of a rank ``rank`` dataset

    The dataset is assumed to be embedded in gaussian noise of shape(n,
    dimf) having spectrum ``spectrum``.

    Parameters
    ----------
    spectrum : array of shape (n)
        Data spectrum.

Test code example (sklearn/decomposition/tests/test_pca.py)

import numpy as np
import scipy as sp
from itertools import product

from sklearn.utils.testing import assert_almost_equal
from sklearn.utils.testing import assert_array_almost_equal
from sklearn.utils.testing import assert_true
from sklearn.utils.testing import assert_equal
from sklearn.utils.testing import assert_greater
from sklearn.utils.testing import assert_raise_message
from sklearn.utils.testing import assert_raises
from sklearn.utils.testing import assert_raises_regex
from sklearn.utils.testing import assert_no_warnings
from sklearn.utils.testing import assert_warns_message
from sklearn.utils.testing import ignore_warnings
from sklearn.utils.testing import assert_less

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.decomposition import RandomizedPCA
from sklearn.decomposition.pca import _assess_dimension_
from sklearn.decomposition.pca import _infer_dimension_

iris = datasets.load_iris()
solver_list = ['full', 'arpack', 'randomized', 'auto']


def test_pca():
    # PCA on dense arrays
    X = iris.data

    for n_comp in np.arange(X.shape[1]):
        pca = PCA(n_components=n_comp, svd_solver='full')

        X_r = pca.fit(X).transform(X)
        np.testing.assert_equal(X_r.shape[1], n_comp)

Answer 1

I think you can get a hint at why the style you describe is popular by looking at one of the imports in the code you quoted:

from ..externals import six

This import gets the six module from inside the sklearn.internals package. But six is a module that can be distributed separately. It's been "vendored" by the sklearn project. Vendoring a module means they're including their own copy of it inside their own project, rather than having it as an external dependency.

Making it possible for your package to be vendored is one reason you might favor relative imports. If your package is normally named my_package , but some other project has vendored it as some_other_package.externals.my_package , your internal imports will still work if you used relative import syntax. If you used absolute imports, they'd all break (and the vendoring project would need to rewrite each of them).

It isn't perfect though. The other package will need to edit your imports if they're also vendoring some of your external dependencies (that you're accessing with absolute imports). Vendoring also has some severe downsides when it comes to fixing bugs. If the vendored version of a package has a security flaw, then so does the project including it. If the outer package only accessed an external version of its dependency, it could benefit from a fix to the external package without needing to be changed itself.

It's worth noting that some Python developers are not entirely happy with how the current system works, as it can be hard to resolve some common issues. A recent thread on the Python-dev mailing list discussed how the current support is imperfect.

Vendoring can be awkward, and it is often undone by downstream packagers, like Linux distros, who need to be able to reliably fix security vulnerabilities. But without it, it's hard to for a package to rely upon version-specific behavior from its dependencies. There's no versioning of imports, so if your code breaks when some_package gets upgraded from version 1.9 to 2.0, there's not much you can do about it (except maybe jump through a whole bunch of hoops to try to support both versions of your dependency at once).

Python relative vs absolute imports in scientific stack repos

Question

1 answers

solution1
3 ACCPTED 2018-04-08 19:56:22

Python relative vs absolute imports in scientific stack repos

Question

1 answers

solution1 3 ACCPTED 2018-04-08 19:56:22

solution1
3 ACCPTED 2018-04-08 19:56:22