How to Set Up Python for Science

Updated May 2026
Setting up Python for scientific computing means installing a Python distribution, creating an isolated virtual environment, and adding the core scientific packages (NumPy, pandas, matplotlib, SciPy, Jupyter) that form the foundation of computational research. The entire process takes about 15 minutes on any operating system, and you have two main paths: Anaconda, which bundles everything in a single installer, or standard Python with pip, which gives you a leaner setup with more control.

The biggest mistake new users make is jumping straight into coding without understanding environments and package management. This leads to version conflicts, broken installations, and the dreaded "it works on my machine" problem that undermines reproducibility. Taking 15 minutes to set things up properly saves hours of debugging later and makes your work reproducible from day one.

Step 1: Choose Your Distribution

You have two main options for getting Python: the Anaconda distribution or standard Python from python.org. Anaconda is a free, pre-packaged distribution that includes Python, over 250 scientific packages (NumPy, pandas, SciPy, matplotlib, scikit-learn, Jupyter, and many more), and the conda package manager, all in a single ~500 MB installer. The advantage is simplicity: install once, and you have everything a scientist needs. The disadvantage is size and the fact that conda's package manager, while powerful, can be slow to resolve dependencies.

Miniforge is a lighter alternative that installs conda and Python without the 250+ bundled packages, letting you add only what you need. This is the best choice for users who want conda's environment management without the bloat. Standard Python from python.org paired with pip gives you the leanest installation and the fastest package resolution, but requires you to install every package manually and handle some system-level dependencies yourself, particularly on Windows where compiling packages from source occasionally requires additional build tools.

For most scientists starting out, Miniforge or Anaconda is the easier path. For experienced programmers who are comfortable with the command line, standard Python with pip and venv works perfectly and keeps your system cleaner. On Linux, your system package manager (apt, dnf, pacman) provides Python and many scientific packages as system packages, but creating virtual environments for project-specific work is still essential to avoid interfering with system Python.

Step 2: Install Python

For the Anaconda path, download the installer from anaconda.com. Choose the Python 3 installer (Python 2 reached end of life in 2020 and should never be used for new work). On Windows, run the .exe installer and check "Add Anaconda to my PATH" if you want to use conda from any terminal, though the installer recommends against this to avoid conflicts with other Python installations. On macOS, run the .pkg installer. On Linux, run the .sh script in a terminal. After installation, open a new terminal and type "conda --version" to verify the installation.

For the standard Python path, download from python.org. On Windows, run the installer and check "Add Python to PATH" at the bottom of the first screen. This is critical: without it, you will need to type the full path to Python every time. On macOS, the installer places Python at /Library/Frameworks/Python.framework. On Linux, Python is usually pre-installed, but you may need to install python3-pip and python3-venv packages separately (sudo apt install python3-pip python3-venv on Ubuntu/Debian). Verify with "python --version" or "python3 --version" in a terminal.

Regardless of which path you chose, ensure you have Python 3.10 or later. Python 3.10 introduced structural pattern matching, improved error messages, and performance improvements. Python 3.11 added 10-25% speed improvements. Python 3.12 added per-interpreter GIL support. Most scientific libraries require at least Python 3.9, and newer versions receive security updates and bug fixes. Using an outdated Python version creates unnecessary compatibility headaches.

Step 3: Create a Virtual Environment

A virtual environment is an isolated Python installation that keeps one project's packages separate from another's. Without virtual environments, installing packages for Project A can break Project B by upgrading a shared dependency to an incompatible version. Virtual environments cost almost no disk space (they share the Python interpreter and standard library) and take seconds to create. Every new research project should start by creating a virtual environment.

With conda: "conda create --name myproject python=3.12" creates an environment named "myproject" with Python 3.12. "conda activate myproject" activates it, and your terminal prompt changes to show the active environment name. "conda deactivate" returns to the base environment. With standard Python: "python -m venv myproject-env" creates an environment in a directory called myproject-env. On macOS/Linux, "source myproject-env/bin/activate" activates it. On Windows, "myproject-env\Scripts\activate" does the same. The activated environment uses its own pip and installs packages into its own directory tree.

Record your environment for reproducibility. With conda: "conda env export > environment.yml" saves a complete specification. Another user runs "conda env create -f environment.yml" to recreate it exactly. With pip: "pip freeze > requirements.txt" saves installed packages and versions. Another user runs "pip install -r requirements.txt" to recreate the environment. Commit these files to your git repository alongside your code. This simple practice is the single most impactful thing you can do for computational reproducibility.

Step 4: Install Core Scientific Packages

With your virtual environment activated, install the essential scientific packages. With conda: "conda install numpy pandas scipy matplotlib jupyter scikit-learn seaborn" installs the core stack in one command. With pip: "pip install numpy pandas scipy matplotlib jupyterlab scikit-learn seaborn" does the same. Both commands pull in all necessary dependencies automatically. The entire scientific stack is about 200 MB for pip installations, larger for conda because conda bundles optimized binary dependencies like MKL (Math Kernel Library) for accelerated linear algebra.

The core packages and what they provide: NumPy for array operations and linear algebra, pandas for data manipulation and analysis, SciPy for scientific algorithms (optimization, integration, statistics, signal processing), matplotlib for plotting, Jupyter or JupyterLab for interactive notebooks, scikit-learn for machine learning, and Seaborn for statistical visualization. These six packages plus their dependencies cover probably 80% of computational research tasks across all scientific domains.

Install domain-specific packages as needed. For bioinformatics: "pip install biopython". For chemistry: "conda install -c conda-forge rdkit". For geospatial work: "conda install geopandas". For deep learning: "pip install torch" or "pip install tensorflow". For NLP: "pip install spacy" followed by "python -m spacy download en_core_web_sm" for the English language model. For astronomy: "pip install astropy". Each field has its own essential packages, and the installation process is always the same: pip install or conda install inside your activated environment.

Step 5: Configure Your Development Environment

Choose an editor or IDE for writing Python code. VS Code (free, by Microsoft) is the most popular choice among scientists because it provides excellent Python support (syntax highlighting, code completion, debugging, integrated terminal, Jupyter notebook rendering) through the Python extension, while remaining lightweight and extensible. PyCharm (free community edition, paid professional edition) provides a more full-featured IDE with advanced refactoring, database tools, and scientific mode with built-in plotting. Spyder, included with Anaconda, provides a MATLAB-like interface with a variable explorer, integrated console, and debugging tools familiar to scientists transitioning from MATLAB.

Configure your chosen editor to use your virtual environment's Python interpreter. In VS Code, press Ctrl+Shift+P, type "Python: Select Interpreter", and choose the Python executable from your virtual environment. This ensures that code completion, linting, and the integrated terminal all use the correct Python with your installed packages. Without this step, the editor may use system Python, which does not have your scientific packages installed.

Verify your setup by running a test script that exercises each core package. Create a new Python file (or Jupyter notebook) and import each package: import numpy as np, import pandas as pd, import matplotlib.pyplot as plt, from scipy import stats, import sklearn. If all imports succeed without errors, your installation is working. Then run a quick computation: generate random data with NumPy, load it into a pandas DataFrame, compute basic statistics with SciPy, and plot it with matplotlib. If the plot appears, your environment is fully functional and ready for research.

Key Takeaway

The most important step is creating a virtual environment before installing anything. This single habit prevents the majority of Python installation problems and makes your research reproducible from the start.