How to Use SciPy

Updated May 2026

SciPy (Scientific Python) is the algorithmic engine of the scientific Python ecosystem, providing optimized implementations of numerical methods for optimization, integration, interpolation, signal processing, statistics, linear algebra, and sparse matrix operations. Where NumPy handles basic array arithmetic and pandas handles data manipulation, SciPy handles the computational heavy lifting: fitting models to data, solving differential equations, running statistical tests, and processing measured signals using algorithms developed and refined by the numerical methods community over decades.

SciPy is organized into subpackages, each covering a domain of numerical computation. Import only the subpackages you need: from scipy import optimize, stats, integrate, signal, interpolate. Each subpackage contains functions and classes for its domain, and the documentation for each function is typically excellent, including mathematical background, algorithm descriptions, parameter explanations, and worked examples. The functions wrap proven Fortran and C libraries (LAPACK, FFTPACK, QUADPACK, MINPACK) behind a Pythonic interface, giving you industrial-strength numerical methods without needing to understand the underlying implementations.

Step 1: Fit Models to Data with Optimization

curve_fit is the most-used function in all of SciPy for experimental scientists. It fits any Python function to data using nonlinear least squares. Define your model as a function whose first argument is x and remaining arguments are the parameters to fit: def exponential(x, a, b, c): return a * np.exp(-b * x) + c. Then call popt, pcov = optimize.curve_fit(exponential, x_data, y_data) to get the best-fit parameters (popt) and their covariance matrix (pcov). The standard errors of the parameters are np.sqrt(np.diag(pcov)). Plot the fit against the data to visually verify the quality.

Initial parameter guesses dramatically affect fitting success. curve_fit uses iterative optimization that can converge to local minima or fail entirely if the starting point is too far from the solution. Provide reasonable initial guesses with the p0 parameter: optimize.curve_fit(model, x, y, p0=[1.0, 0.5, 0.0]). Estimate initial values from the data: for an exponential decay, the amplitude is roughly y[0] - y[-1], the decay rate is roughly 1 / (time to half-maximum), and the baseline is roughly y[-1]. Good initial guesses turn a fragile fitting process into a reliable one.

minimize handles general optimization problems beyond curve fitting. optimize.minimize(cost_function, x0, method='Nelder-Mead') finds the parameter values that minimize any scalar function. The method parameter selects the algorithm: 'Nelder-Mead' for derivative-free optimization (robust for noisy functions), 'BFGS' for smooth functions (uses gradient information for faster convergence), 'L-BFGS-B' for large-scale bounded problems, 'trust-constr' for constrained optimization. Bounds and constraints restrict the parameter space: bounds=[(0, None), (0, 10)] requires the first parameter to be positive and the second between 0 and 10.

Root finding solves equations of the form f(x) = 0. optimize.root_scalar(f, bracket=[a, b]) finds a root in the interval [a, b] where f(a) and f(b) have opposite signs. optimize.root(f, x0) solves systems of nonlinear equations where f returns a vector. optimize.fsolve(f, x0) is a simpler interface for the same task. These functions are essential for finding equilibrium points, intersection points, and critical values where a function meets a specific criterion.

Step 2: Integrate Functions and Solve ODEs

Numerical integration computes definite integrals when analytical solutions are unavailable. integrate.quad(f, a, b) computes the integral of f from a to b, returning (result, error_estimate). quad handles difficult integrands: infinite limits (use np.inf), singularities at endpoints, and oscillatory functions. integrate.dblquad(f, a, b, gfun, hfun) computes double integrals where the inner limits can be functions of the outer variable. For multi-dimensional integrals over simple regions, integrate.nquad(f, ranges) generalizes to arbitrary dimensions.

Solving ordinary differential equations (ODEs) simulates dynamical systems. integrate.solve_ivp(f, t_span, y0) solves dy/dt = f(t, y) from time t_span[0] to t_span[1] with initial condition y0. The function f takes (t, y) and returns the derivative dy/dt. For systems of equations, y and the return value are arrays. solve_ivp automatically selects step sizes for accuracy and chooses between stiff and non-stiff solvers based on the problem. The t_eval parameter specifies output times: sol = solve_ivp(f, [0, 10], y0, t_eval=np.linspace(0, 10, 100)) returns the solution at 100 evenly spaced times.

The choice of ODE solver matters for performance and accuracy. method='RK45' (default) is a 4th/5th order Runge-Kutta method, efficient for smooth, non-stiff problems like orbital mechanics and population dynamics. method='BDF' (Backward Differentiation Formula) handles stiff problems where some components change much faster than others, common in chemical kinetics, electrical circuits, and multiscale systems. method='Radau' is an implicit Runge-Kutta method that handles both stiff and non-stiff problems well. If RK45 is extremely slow or produces warnings about too many steps, switching to BDF or Radau usually solves the problem immediately.

Events in ODE solutions detect when specific conditions are met during integration. Define an event function that returns zero when the condition occurs: def hit_ground(t, y): return y[0] (detects when position y[0] crosses zero). Set hit_ground.terminal = True to stop integration at the event. Pass events=[hit_ground] to solve_ivp. The solution object contains t_events and y_events arrays with the times and states at which events occurred. This mechanism handles bouncing balls, threshold crossings, switch points, and any other discontinuity-driven logic in dynamical systems.

Step 3: Run Statistical Tests

scipy.stats provides over 100 probability distributions and dozens of statistical tests. Each distribution object (stats.norm, stats.t, stats.chi2, stats.poisson, stats.binom, and many more) supports common operations: .pdf(x) for probability density, .cdf(x) for cumulative distribution, .ppf(q) for quantiles (inverse CDF), .rvs(size=n) for random sampling, .fit(data) for maximum likelihood parameter estimation, and .interval(confidence) for confidence intervals. stats.norm.ppf(0.975) returns 1.96, the 97.5th percentile of the standard normal distribution used in 95% confidence intervals.

The t-test compares means between groups. stats.ttest_ind(group1, group2) tests whether two independent groups have different means, returning (t_statistic, p_value). stats.ttest_rel(before, after) tests paired observations (same subjects measured twice). stats.ttest_1samp(data, expected_mean) tests whether a sample mean differs from a known value. All t-test functions assume normal distributions; for non-normal data, use stats.mannwhitneyu(group1, group2) (the non-parametric equivalent of the independent t-test) or stats.wilcoxon(before, after) (non-parametric paired test).

ANOVA and correlation handle multi-group comparisons and relationships. stats.f_oneway(group1, group2, group3) tests whether any group means differ (one-way ANOVA). stats.pearsonr(x, y) returns (correlation_coefficient, p_value) for linear relationships. stats.spearmanr(x, y) returns Spearman rank correlation for monotonic relationships. stats.kendalltau(x, y) returns Kendall's tau for ordinal data. stats.chi2_contingency(table) tests independence in contingency tables. stats.ks_2samp(sample1, sample2) tests whether two samples come from the same distribution.

Distribution fitting determines which probability distribution best describes your data. stats.norm.fit(data) returns the maximum likelihood estimates of mean and standard deviation. stats.expon.fit(data) fits an exponential distribution. stats.gamma.fit(data) fits a gamma distribution. After fitting, use stats.kstest(data, 'norm', args=fit_params) to test how well the fitted distribution matches the data. For comparing multiple candidate distributions, fit each one and compare their AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) values, computed from the log-likelihood returned by the fitted distribution's .logpdf(data).sum() method.

Step 4: Process Signals

Digital filtering removes noise and isolates frequency bands of interest. Design a filter with signal.butter(order, cutoff, btype, fs) for Butterworth filters: signal.butter(4, 50, 'low', fs=1000) creates a 4th-order low-pass filter with 50 Hz cutoff for data sampled at 1000 Hz. Apply it with signal.sosfilt(sos, data) using second-order sections for numerical stability. btype='high' for high-pass, 'band' for bandpass (pass a two-element cutoff), and 'bandstop' for notch filtering. For zero-phase filtering (no time delay), use signal.sosfiltfilt(sos, data), which applies the filter forward and backward.

Spectral analysis reveals the frequency content of signals. signal.welch(data, fs=1000, nperseg=256) computes the power spectral density using Welch's method, returning (frequencies, power). signal.spectrogram(data, fs=1000) computes a time-frequency representation, returning (frequencies, times, power), useful for signals whose frequency content changes over time (speech, music, seismic events). signal.stft() computes the Short-Time Fourier Transform for more control over window parameters. For basic FFT operations, use numpy.fft.fft(data) and numpy.fft.fftfreq(n, d=1/fs) for the frequency axis.

Peak detection finds significant features in signals. signal.find_peaks(data) returns the indices of local maxima. Parameters control what counts as a peak: height=0.5 requires peaks above a threshold, distance=100 requires at least 100 samples between peaks, prominence=0.2 requires peaks to stand out from surrounding data by at least 0.2, and width=10 requires peaks to be at least 10 samples wide. The function returns both peak indices and a dictionary of peak properties (heights, prominences, widths) for further filtering and analysis.

Convolution and correlation measure signal similarity. signal.correlate(a, b, mode='full') computes the cross-correlation, showing how signal b slides along signal a. The peak of the cross-correlation indicates the time lag of best alignment, used for time delay estimation, template matching, and detecting repeated patterns. signal.convolve(signal, kernel) applies a convolution, used for smoothing (with a rectangular or Gaussian kernel), edge detection (with a derivative kernel), and implementing FIR filters. signal.deconvolve() reverses convolution to recover original signals.

Step 5: Interpolate and Transform Data

Interpolation estimates values between known data points. interpolate.interp1d(x, y, kind='linear') creates a function that interpolates between known (x, y) pairs. kind='cubic' uses cubic splines for smoother results. kind='nearest' uses nearest-neighbor interpolation for categorical or step-function data. The returned function can be called with any x value within the original range: f_interp = interpolate.interp1d(x, y, kind='cubic'), then y_new = f_interp(x_new). For data with gaps or irregular sampling, interpolation to a regular grid enables spectral analysis, resampling, and comparison with other datasets.

interpolate.UnivariateSpline(x, y, s=smoothing) fits a smoothing spline that balances closeness to data against smoothness. Setting s=0 forces the spline through every data point (interpolation). Increasing s allows the spline to deviate from points to achieve a smoother curve (regression spline). The optimal s depends on the noise level: for data with known measurement uncertainty sigma, s = len(x) * sigma**2 is a good starting point. The .derivative() and .integral() methods of the spline object compute derivatives and integrals analytically from the fitted spline coefficients.

Two-dimensional interpolation handles gridded spatial data. interpolate.RegularGridInterpolator((x, y), values) interpolates on regular grids. interpolate.griddata(points, values, grid_points, method='cubic') interpolates scattered data (irregular point locations) onto a regular grid, essential for mapping and spatial analysis where measurements are taken at arbitrary locations. interpolate.RBFInterpolator(points, values) uses radial basis functions for high-quality scattered data interpolation in any number of dimensions.

Fourier transforms decompose signals into frequency components. numpy.fft.fft(data) computes the discrete Fourier transform, returning complex coefficients whose magnitudes give amplitudes at each frequency. numpy.fft.ifft(coefficients) computes the inverse transform, reconstructing the signal from its frequency representation. For real-valued signals, numpy.fft.rfft(data) is more efficient because it exploits the conjugate symmetry of real transforms. scipy.fft (the newer module) provides the same functions with additional features like multithreading support and the discrete cosine transform (DCT) used in image compression.

Key Takeaway

SciPy's curve_fit for model fitting, solve_ivp for differential equations, and scipy.stats for statistical testing are the three functions that cover the majority of computational tasks in experimental science.

Step 1: Fit Models to Data with Optimization

Step 2: Integrate Functions and Solve ODEs

Step 3: Run Statistical Tests

Step 4: Process Signals

Step 5: Interpolate and Transform Data

Related Articles

How to Use NumPy

Statistics in Python

Signal Processing in Python

Numerical Methods