python pip priority order with index-url and extra-index-url

Question

I searched a bit but could not find a clear answer.
The goal is, to have two pip indexes, one is a private index, that will be a first priority. And one is the standard PyPI. The priority is there to prevent the security risk of code injection.

Say I have library named lib , and I configure index_url = http://my_private_pypi_repo and extra_index_url = https://pypi.org/simple

If I pip install lib , and lib exists in both indexes. What index will get the priority? From where it is going to be installed from?

Also, if I pip install lib=0.0.2 but lib exists in my private index at version 0.0.1. Is it going to look at PyPI as well?

And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?

Answer 1

The short answer is: there is no prioritization and you probably should avoid using --extra-index-url entirely.

This is asked and answered here: https://github.com/pypa/pip/issues/5045#issuecomment-369521345

Question :

I have this in my pip.conf:
 [global] index-url = https://myregistry-xyz.com extra-index-url = https://pypi.python.org/pypi
Let's assume packageX exists in both registries and I run pip install packageX.

I expect pip to install packageX from https://myregistry-xyz.com , but pip will use https://pypi.python.org/pypi instead.

If I switch the values for index-url and extra-index-url I get the same result. pypi is always prioritized.

Answer :

Packages are expected to be unique up to name and version, so two wheels with the same package name and version are treated as indistinguishable by pip. This is a deliberate feature of the package metadata, and not likely to change.

I would also recommend reading this discussion: https://discuss.python.org/t/dependency-notation-including-the-index-url/5659

There are quite a lot of things that are addressed in this discussion, some that is clearly out of scope for this question, but everything is very informative anyway.

In there, there should be the key takeaway for you:

Pip does not really prioritize one index over the other in theory. In practice, because of a coincidence in the way things are implemented in code, it might be that one is always checked first, but it is not a behavior you should rely on.

And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?

You should setup and curate your own package index (devpi, pydist, jfrog artifactory, sonatype nexus, etc.) and use it exclusively, meaning: never use --extra-index-url . This is the only way you can have exact control over what gets downloaded. This custom repository might function mostly a proxy for the public PyPI, except for a couple of dependencies.

Related :

pip: selecting index url based on package name?

Answer 2

The title of this question feels a bit like an instance of XY problem . If you would elaborate more on what you want to achieve and what your constraints are we may be able to give you a better answer.

That said, sinoroc's suggestion to curate your own package index and use only that is a good one. A few other ideas also come to mind:

Update : Turns out pip may run distributions other than those in the constraints file so this method should probably be considered insecure. Additionally hashes are kind of broken on recent releases of pip.
Using a constraints file with hashes. This file can be generated using pip-tools like pip-compile --generate-hashes assuming you have documented your dependencies in a file named requirements.in . You can then install packages like pip install -c requirements.txt some_package .
- Pro: What may be installed is documented alongside your code in your VCS.
- Con: Controlling what is downloaded the first time is either tricky or laborious.
- Con: Hash checking can be slow.
- Con: You run into issues more frequently than when not using hashes. Some can be worked around others cannot; it is for instance not possible to combine constraints like -e file://` with hashes.
Use an alternative packaging tool like pipenv. It works similarly to the previous suggestion.
- Pro: Easy to use
- Con: Harder to integrate into your workflow if it does not fit naturally.
Curate packages locally. Packages and dependencies can be downloaded like pip download --dest some_dir some_package and installed like pip install --no-index --find-links some_dir .
- Pro: What may be installed can be documented alongside your code, if you track the artifacts in VCS eg git lfs.
- Con: Either all packages are downloaded or none are.
Use a hermetic build system. I know bazel advertise this as a feature, not sure about others like pants and buck.
- Pro: May be the ultimate solution if you want control over your builds.
- Con: Does not integrate well with open source python ecosystem afaik.
- Con: A lot of overhead.

1 : https://en.wikipedia.org/wiki/XY_proble

python pip priority order with index-url and extra-index-url

Question

2 answers

solution1
2 2021-05-02 15:51:06

solution2
0 2021-05-06 19:57:42

python pip priority order with index-url and extra-index-url

Question

2 answers

solution1 2 2021-05-02 15:51:06

solution2 0 2021-05-06 19:57:42

solution1
2 2021-05-02 15:51:06

solution2
0 2021-05-06 19:57:42