Python Best Practices for Scientists
Scientific code has different priorities than commercial software. It needs to be correct above all else, because wrong results published as science cause real harm. It needs to be readable, because peer reviewers, collaborators, and your future self must understand what it does. It needs to be reproducible, because science that cannot be independently verified is not science. It does not need to be elegant, scalable, or enterprise-grade. These priorities mean that clarity beats cleverness, explicit beats implicit, and tested beats untested, every time.
Step 1: Organize Your Project
A consistent directory structure makes projects navigable. The minimal research project layout: data/ for raw data (never modified by code), notebooks/ for Jupyter notebooks, scripts/ for analysis scripts, src/ or lib/ for reusable functions and modules, results/ for generated outputs (figures, tables, processed data), tests/ for test code, and the root level for README.md, requirements.txt, and configuration files. This structure separates concerns: data from code, reusable code from one-off scripts, and outputs from inputs.
Name files descriptively. 01_load_and_clean.py, 02_exploratory_analysis.py, 03_statistical_tests.py, 04_generate_figures.py tells a reader exactly what each script does and in what order to run them. analysis.py, utils.py, and helpers.py tell a reader nothing. For data files: experiment_2026-05-19_treatment_group.csv beats data.csv. For figures: figure3_survival_curves.pdf beats plot.pdf. Descriptive names cost nothing to type and save minutes of confusion later.
Separate configuration from code. Hard-coded values like file paths, parameter values, and thresholds should live in a configuration file (config.yml, config.json) or at the top of the script in clearly labeled constants: DATA_PATH = Path('data/raw/experiment.csv'), ALPHA = 0.05, MIN_SAMPLE_SIZE = 30. This makes it easy to change parameters without reading through the analysis logic, and it makes the parameter choices explicit and reviewable rather than buried in function calls.
Step 2: Write Clean Functions
Functions should do one thing and have a name that says what they do. compute_effect_size(group1, group2) is clear. process_data(df) is vague. analyze(x) is useless. If you cannot name a function without using "and" (load_and_clean_and_analyze), it does too many things and should be split. Short functions (under 20 lines) are easier to understand, test, debug, and reuse than long functions. If a function has many parameters, some of them probably belong in a configuration object or should be split into separate functions.
Type hints document what a function expects and returns. def compute_mean_difference(group1: np.ndarray, group2: np.ndarray) -> float: tells the reader that it takes two NumPy arrays and returns a float. Type hints are not enforced at runtime (Python is dynamically typed), but they serve as documentation, enable IDE autocompletion, and can be checked with mypy for consistency. For scientific code, hints like pd.DataFrame, np.ndarray, Path, and Union[float, None] cover most cases.
Return values rather than modifying inputs. def clean_data(df: pd.DataFrame) -> pd.DataFrame: return df.dropna().reset_index(drop=True) returns a new DataFrame rather than modifying the input in place. This makes the data flow explicit: clean_df = clean_data(raw_df) shows exactly where the cleaned data comes from. Functions that modify their inputs silently ("side effects") create bugs when the caller does not expect the modification: data was changed somewhere, but the code does not show where.
Avoid global variables. Every value a function needs should be passed as a parameter. Global variables create hidden dependencies between functions, make testing impossible (the function's behavior depends on state set elsewhere), and cause bugs when two parts of the code modify the same global variable. Constants (ALL_CAPS names that are set once and never modified) are acceptable as module-level variables: SPEED_OF_LIGHT = 299792458, BOLTZMANN_CONSTANT = 1.380649e-23.
Step 3: Handle Errors Gracefully
Assertions verify internal assumptions. assert len(data) > 0, 'Data array must not be empty'. assert np.all(np.isfinite(data)), 'Data contains NaN or Inf values'. assert df['temperature'].between(-100, 100).all(), 'Temperature values out of physical range'. Assertions document what the code assumes and catch violations early, before they propagate into wrong results. Place assertions at function boundaries (checking inputs) and after critical computations (checking that outputs make sense). Assertions can be disabled in optimized mode (python -O), so do not use them for user input validation.
Exceptions handle recoverable errors. try: result = compute_statistic(data) except ValueError as e: logging.warning(f'Skipping sample: {e}'), result = np.nan. Use specific exception types (ValueError, FileNotFoundError, KeyError) rather than bare except clauses, which catch everything including keyboard interrupts and system exits. For data processing pipelines, catch exceptions per-item and continue with the rest rather than crashing the entire batch on the first error. Collect errors and report them at the end.
Logging provides diagnostic information without cluttering the output. import logging, logging.basicConfig(level=logging.INFO). logging.info(f'Loaded {len(df)} records from {path}'). logging.warning(f'Column {col} has {null_count} missing values'). logging.error(f'Failed to process {file}: {e}'). Logging messages persist in log files for post-hoc debugging, can be filtered by severity level, and include timestamps that help diagnose timing-related issues. Replace print() statements with logging calls in any code that runs unattended or processes multiple files.
Step 4: Test Your Code
Unit tests verify that individual functions produce correct results for known inputs. Create a tests/ directory with test files: test_statistics.py, test_data_cleaning.py. Use pytest (pip install pytest). def test_mean_difference(): assert compute_mean_difference(np.array([1, 2, 3]), np.array([4, 5, 6])) == 3.0. def test_mean_difference_equal_groups(): assert compute_mean_difference(np.array([5, 5, 5]), np.array([5, 5, 5])) == 0.0. Run tests with pytest from the project root. Tests should cover normal cases, edge cases (empty arrays, single elements, all identical values), and error cases (invalid inputs should raise appropriate exceptions).
Numerical tests require tolerance for floating-point comparison. Never use == for floating-point numbers: 0.1 + 0.2 != 0.3 in IEEE 754 arithmetic. Use np.testing.assert_allclose(result, expected, rtol=1e-7) for relative tolerance or np.testing.assert_allclose(result, expected, atol=1e-10) for absolute tolerance. For statistical functions that produce random results, set the random seed before the test and compare against pre-computed expected values.
Regression tests verify that analysis outputs do not change unexpectedly when code is modified. Run the analysis, save the results (np.save('tests/expected_output.npy', result)), and write a test that compares current output to saved output: def test_analysis_regression(): result = run_analysis(), expected = np.load('tests/expected_output.npy'), np.testing.assert_allclose(result, expected). When you intentionally change the analysis, update the expected output and document why the results changed. Regression tests catch accidental changes that silently alter results.
Sanity checks within analysis code verify that intermediate results make physical or statistical sense. After loading data: assert df['age'].between(0, 120).all(). After computing probabilities: assert np.all((probs >= 0) & (probs <= 1)). After fitting a model: assert model.rsquared >= 0. These inline checks catch data quality issues and computational errors at the point they occur rather than letting them propagate into downstream results that are harder to diagnose. Think of sanity checks as executable documentation of what the data and results should look like.
Step 5: Optimize When Necessary
Profile before optimizing. Premature optimization wastes time on code that is not the bottleneck. Use %timeit in Jupyter to measure execution time. Use cProfile to identify which functions consume the most time. Only optimize the function that actually dominates execution. A function that runs in 10 ms is not worth optimizing even if you can make it 100 times faster, because saving 9.9 ms is irrelevant if the script takes 5 minutes total.
Vectorize before parallelizing. Replacing a Python for-loop with a NumPy vectorized operation typically provides a 10x to 100x speedup on a single core. Parallelizing a slow Python loop across 8 cores provides at most an 8x speedup. Vectorize first, then parallelize if still needed. Common vectorization patterns: replace for i in range(len(a)): result[i] = a[i] * b[i] with result = a * b. Replace conditional loops with np.where(condition, value_if_true, value_if_false). Replace accumulation loops with np.cumsum, np.cumprod, or np.add.reduceat.
Choose the right data structure. pandas DataFrames are convenient but slow for element-wise numerical computation; extract the underlying NumPy array (df['column'].values) for intensive operations. Dictionaries provide O(1) lookup vs. O(n) for list search: if you are checking membership repeatedly, convert the list to a set. For sparse data (mostly zeros), scipy.sparse matrices use orders of magnitude less memory and computation than dense arrays.
Document performance-critical decisions. If you chose a specific algorithm or data structure for performance reasons, add a brief comment explaining why. If you used an approximation instead of an exact computation for speed, document the trade-off and the error bound. If you parallelized a loop, document the expected speedup and any assumptions about independence. These notes prevent future readers (including yourself) from "simplifying" the code back to the slow version without understanding why the fast version was needed.
Good scientific code prioritizes correctness and readability above all else. Use descriptive names, write small functions, add assertions for assumptions, write tests for critical computations, and use version control. These habits catch errors before they become published mistakes.