Managing Python Packages
Package management is not glamorous, but it is load-bearing infrastructure for every Python project. A single version mismatch (scikit-learn 1.3 expected but 1.2 installed) can change numerical results, raise cryptic errors, or silently produce wrong answers. Scientists who skip proper package management pay the cost later in debugging sessions, unreproducible results, and the dreaded "it worked on my machine." Investing 15 minutes in proper setup at the start of a project prevents hours of pain throughout its lifetime.
Step 1: Understand pip and conda
pip is Python's default package installer, included with every Python installation. pip install numpy downloads numpy from PyPI (the Python Package Index, hosting over 500,000 packages) and installs it. pip handles Python packages only and relies on the operating system for non-Python dependencies (C compilers, system libraries, BLAS implementations). For pure Python packages and packages with pre-built binary wheels (which most scientific packages now provide), pip works well on all platforms.
conda is a cross-platform package and environment manager that handles both Python packages and non-Python dependencies (C libraries, compilers, CUDA toolkits, R, Java). conda install numpy downloads numpy from conda channels (default, conda-forge) along with optimized BLAS libraries, C runtime dependencies, and everything else needed for the package to work. conda solves the "but it needs libfoo installed system-wide" problem that pip cannot address. conda-forge is the community-maintained channel with the broadest package selection.
When to use which: use conda when installing packages with complex compiled dependencies (RDKit, GDAL, OpenCV, PyTorch with specific CUDA versions, any package that requires system libraries). Use pip for everything else, especially packages only available on PyPI (many smaller, specialized packages are not on conda channels). You can use pip inside a conda environment (conda activate env, then pip install package), but do all conda installs first because conda does not track pip-installed packages and may overwrite them.
Miniforge (miniforge.github.io) provides a lightweight conda installation with conda-forge as the default channel and the faster mamba solver. It is the recommended starting point for scientists who want conda's capabilities without Anaconda's bulk. mamba install package is a drop-in replacement for conda install that resolves dependencies 10 to 100 times faster, a significant quality-of-life improvement for complex environments.
Step 2: Create and Manage Virtual Environments
A virtual environment is an isolated Python installation where packages are installed independently of other environments. Without environments, installing a package for Project A can upgrade or downgrade a dependency that Project B needs at a different version, breaking Project B. Virtual environments eliminate this by giving each project its own package directory. The overhead is minimal: environments share the Python interpreter and standard library, adding only the project-specific packages.
With venv (Python's built-in tool): python -m venv .venv creates an environment in the .venv directory. source .venv/bin/activate (macOS/Linux) or .venv\Scripts\activate (Windows) activates it. Your prompt changes to show the active environment. pip install numpy installs into the environment only. deactivate returns to the system Python. The .venv directory is local to the project and should be added to .gitignore (it is recreatable from the requirements file and should not be committed).
With conda: conda create -n myproject python=3.12 creates a named environment. conda activate myproject activates it. conda install numpy installs into the environment. conda deactivate returns to base. conda env list shows all environments. conda env remove -n myproject deletes an environment. Conda environments are stored in a central directory (~/.conda/envs/ by default) rather than in the project directory, which keeps project directories clean but means environments are not automatically associated with specific project directories.
Best practice: create a new environment for every project. Name it after the project. Activate it before working on the project. Install only the packages that project needs. Record the environment (next step). When you return to the project months later, activate the environment and everything works with the same versions you originally used. When you finish a project, the environment can be deleted without affecting anything else.
Step 3: Specify Dependencies
pip freeze > requirements.txt records every installed package and its exact version: numpy==1.26.4, pandas==2.2.1, scipy==1.12.0. Another user runs pip install -r requirements.txt to install the exact same versions. This is the simplest and most widely understood dependency specification format. Include requirements.txt in your git repository. Update it whenever you add or change packages: activate the environment, pip install new_package, then pip freeze > requirements.txt.
For conda environments: conda env export > environment.yml records the full environment specification including Python version, conda packages, pip packages, and channel sources. The file is human-readable YAML. Another user runs conda env create -f environment.yml to recreate the environment. For cross-platform compatibility, use conda env export --from-history > environment.yml, which records only the packages you explicitly installed (not their automatically installed dependencies), allowing conda to resolve platform-specific dependencies on the target system.
Pin versions for reproducibility. requirements.txt with numpy==1.26.4 guarantees the exact version. numpy>=1.26 allows any version 1.26 or later, which may produce different results or break in the future. numpy (no version) installs whatever is latest, which changes over time. For research code, pin everything. For library code that others will depend on, use minimum version bounds (numpy>=1.22) to allow flexibility. The distinction matters: research code needs exact reproducibility, library code needs compatibility.
pip-tools (pip install pip-tools) provides a more sophisticated workflow. Write your direct dependencies (the packages you explicitly use) in requirements.in: numpy, pandas, scipy, matplotlib, scikit-learn. Run pip-compile requirements.in to generate requirements.txt with pinned versions of both your direct dependencies and all their transitive dependencies. This separates what you need (requirements.in) from the exact resolution (requirements.txt), making it easy to update: edit requirements.in, re-run pip-compile, and get a fresh resolution with updated transitive dependencies.
Step 4: Resolve Dependency Conflicts
Dependency conflicts occur when two packages require different versions of a shared dependency. Package A requires numpy>=1.24 while Package B requires numpy<1.24. pip reports this as: "ERROR: Cannot install A and B because they have conflicting dependencies." The solution is usually to update one or both packages to versions that agree on the shared dependency: pip install --upgrade A B. If no compatible combination exists, you may need to use an older version of one package, find an alternative package, or use separate environments for the conflicting tools.
pip check verifies that all installed packages have compatible dependencies. Run it after installing packages: pip check. If it reports conflicts, address them before running your code. pipdeptree (pip install pipdeptree) shows the dependency tree: pipdeptree --warn silence reveals which packages pulled in which dependencies, helping you trace the source of conflicts. pip install package --dry-run shows what would be installed without actually installing, useful for checking whether a new package would create conflicts.
Common scientific computing conflicts and their solutions. TensorFlow and PyTorch may require specific NumPy version ranges: install them first, then install other packages. Packages with CUDA dependencies must match your CUDA driver version: check the package's installation guide for the correct command. Some packages (GDAL, cartopy, h5py) require system libraries that pip cannot install: use conda for these, or install the system libraries separately (apt install libgdal-dev on Ubuntu). When in doubt, create a fresh environment and install packages one at a time, testing after each to isolate which installation causes problems.
The nuclear option for unresolvable conflicts: delete the environment and start fresh. conda env remove -n broken_env or rm -rf .venv, then recreate from your requirements file. This is faster than trying to manually resolve a tangled dependency graph, and it ensures a clean state without leftover partially-installed packages. Keep your requirements.in or environment.yml current so that recreating is painless.
Step 5: Share and Publish Packages
If you have written reusable code that other researchers could benefit from, packaging it for pip installation makes it accessible to the entire Python community. Create a pyproject.toml file (the modern standard, replacing setup.py) that specifies your package name, version, description, author, license, Python version requirements, and dependencies. The minimum structure: a pyproject.toml, a src/ directory with your package code, and a README.md. Build with python -m build, which creates a distributable package file.
Upload to PyPI (the Python Package Index) with twine: twine upload dist/*. After upload, anyone can install your package with pip install your-package-name. First, test on TestPyPI (test.pypi.org) to verify that the package installs correctly before publishing to the real PyPI. Choose a unique package name (check pypi.org first), use semantic versioning (major.minor.patch), and write a clear README that explains what the package does, how to install it, and how to use it with a minimal example.
For conda distribution, create a conda recipe (meta.yaml) that specifies how to build the package for conda. Publish to conda-forge by submitting a recipe to the conda-forge staged-recipes repository on GitHub. The conda-forge community reviews the recipe and, once accepted, automatically builds packages for Linux, macOS, and Windows. conda-forge distribution is especially valuable for packages with compiled dependencies because conda handles cross-platform binary distribution better than pip.
Even if you do not publish to PyPI, making your research code pip-installable from a git repository is valuable. pip install git+https://github.com/user/repo.git installs directly from a GitHub repository. Include a pyproject.toml in your research repository so that collaborators can install your analysis functions as a package. This is better than copying .py files between projects because pip handles dependencies, version tracking, and import paths automatically.
Create a virtual environment for every project, pin all package versions in a requirements file, and commit that file to version control. This three-step habit prevents the vast majority of installation problems and ensures your computational environment is reproducible.