How to Use NumPy
NumPy's core innovation is the ndarray: a multidimensional array that stores homogeneous numerical data in contiguous memory and supports element-wise operations without Python loops. A for-loop adding two Python lists of one million numbers takes about 200 milliseconds. The equivalent NumPy operation (a + b) takes about 2 milliseconds, a 100x speedup, because the addition runs in pre-compiled C code that exploits CPU vectorization instructions. This performance difference defines how you write scientific Python: you think in arrays and operations on arrays, not in elements and loops over elements.
Step 1: Create Arrays
Import NumPy with the standard alias: import numpy as np. Create arrays from Python lists with np.array([1, 2, 3]) for 1D or np.array([[1, 2], [3, 4]]) for 2D. NumPy infers the data type from the input, using int64 for integers and float64 for floats. Force a specific type with the dtype parameter: np.array([1, 2, 3], dtype=np.float32) creates a 32-bit float array, halving memory usage compared to the default float64.
Generator functions create arrays without building Python lists first. np.zeros((1000, 1000)) creates a 1000x1000 matrix of zeros. np.ones((3, 4)) creates a 3x4 matrix of ones. np.full((5, 5), 3.14) fills a 5x5 matrix with 3.14. np.eye(4) creates a 4x4 identity matrix. np.arange(0, 10, 0.5) creates an array from 0 to 9.5 in steps of 0.5 (like Python's range but for floats). np.linspace(0, 1, 100) creates 100 evenly spaced points between 0 and 1 inclusive, the most common way to create x-axis values for plotting continuous functions.
Random array generation uses np.random for simulations and testing. np.random.rand(3, 4) creates a 3x4 array of uniform random numbers between 0 and 1. np.random.randn(1000) creates 1000 samples from a standard normal distribution. np.random.randint(1, 100, size=50) creates 50 random integers between 1 and 99. For reproducible results, set the random seed: rng = np.random.default_rng(42) creates a generator with seed 42, and rng.normal(0, 1, 1000) generates 1000 normal samples that will be identical every time the code runs. Always set seeds in scientific work to ensure reproducibility.
Load data from files with np.loadtxt('data.csv', delimiter=',') for simple numeric files or np.genfromtxt('data.csv', delimiter=',', names=True, dtype=None) for files with headers and mixed types. np.load('data.npy') loads arrays saved in NumPy's binary format, which preserves exact array structure and dtype and is much faster than text I/O for large arrays. np.save('data.npy', array) saves an array, and np.savez('data.npz', x=array1, y=array2) saves multiple arrays in a single compressed file.
Step 2: Use Vectorized Operations
Vectorized operations apply to every element of an array simultaneously. All arithmetic operators work element-wise: a + b, a - b, a * b, a / b, a ** 2. These are not matrix operations; they operate on corresponding elements. Two arrays of shape (1000,) produce a result of shape (1000,) where each element is the operation applied to the corresponding pair. Comparison operators (a > 5, a == b, a != 0) return boolean arrays of the same shape, where each element is True or False.
Universal functions (ufuncs) apply mathematical functions element-wise. np.sin(a), np.cos(a), np.exp(a), np.log(a), np.sqrt(a), np.abs(a) all operate on every element and return an array of the same shape. np.sum(a) computes the total sum. np.mean(a), np.std(a), np.var(a) compute statistics. np.min(a), np.max(a) find extremes. np.argmin(a), np.argmax(a) return the index of the min/max element. All of these accept an axis parameter: np.mean(a, axis=0) computes column means of a 2D array, np.mean(a, axis=1) computes row means.
Avoid Python loops when working with NumPy arrays. Every time you write "for i in range(len(array))" with a NumPy array, there is almost certainly a vectorized alternative that is 10 to 100 times faster. Instead of looping to count elements greater than 5, use np.sum(a > 5). Instead of looping to replace negative values with zero, use np.maximum(a, 0) or a[a < 0] = 0. Instead of looping to compute pairwise distances, use np.linalg.norm(a[:, None] - b[None, :], axis=2). Thinking in whole-array operations rather than element-by-element loops is the fundamental skill of NumPy programming.
Aggregate operations reduce arrays along specified axes. np.cumsum(a) computes the cumulative sum. np.diff(a) computes consecutive differences. np.where(condition, x, y) creates a new array that takes values from x where condition is True and y where it is False. np.unique(a) returns sorted unique values. np.sort(a) returns a sorted copy. np.concatenate([a, b]) joins arrays along an existing axis. np.stack([a, b]) joins arrays along a new axis. These building blocks combine to express complex data transformations concisely.
Step 3: Index and Slice Arrays
NumPy extends Python's slicing syntax to multiple dimensions. For a 2D array, a[2, 3] selects row 2, column 3. a[1:4, :] selects rows 1 through 3, all columns. a[:, 0] selects all rows, column 0 (a column vector). a[::2, ::2] selects every other row and every other column (downsampling by 2). Negative indices count from the end: a[-1, :] is the last row, a[:, -2:] is the last two columns. Slicing returns a view (not a copy) of the original array, meaning modifications to the slice affect the original. Use a[1:4, :].copy() to get an independent copy.
Boolean indexing uses a boolean array to select elements. a[a > 0] returns all positive elements as a 1D array. a[np.isnan(a)] = 0 replaces all NaN values with zero. Boolean arrays combine with logical operators: a[(a > 0) & (a < 10)] selects elements between 0 and 10 (note the parentheses, which are required because & has higher precedence than > and <). This pattern replaces conditional loops: instead of iterating to find and modify elements meeting a condition, you express the condition as a boolean array and apply it directly.
Fancy indexing uses integer arrays to select specific elements. a[[0, 3, 7]] selects elements at indices 0, 3, and 7. For 2D arrays, a[[0, 2, 4], [1, 3, 5]] selects elements at positions (0,1), (2,3), and (4,5). np.ix_([0, 2], [1, 3]) creates an open mesh for selecting a rectangular sub-array. Fancy indexing always returns a copy, not a view, which differs from basic slicing.
Reshaping changes array dimensions without copying data. a.reshape(10, 5) converts a 50-element 1D array to 10 rows and 5 columns. a.reshape(-1, 1) converts any 1D array to a column vector (-1 tells NumPy to infer that dimension). a.T transposes a 2D array (swaps rows and columns). a.flatten() converts any array to 1D. np.expand_dims(a, axis=0) adds a new dimension, useful for preparing data for operations that require specific dimensionality. a.squeeze() removes dimensions of size 1.
Step 4: Apply Broadcasting
Broadcasting is NumPy's mechanism for performing operations on arrays of different shapes without explicitly replicating data. When you add a scalar to an array (a + 5), the scalar is "broadcast" to match the array's shape. When you subtract a row vector from a 2D array (data - data.mean(axis=0)), the row vector is broadcast across all rows, centering each column. Broadcasting eliminates the need to create expanded copies of data, saving both memory and computation time.
Broadcasting rules determine when shapes are compatible. NumPy compares dimensions from right to left. Two dimensions are compatible if they are equal or if one of them is 1. A (5, 3) array and a (3,) array are compatible because the trailing dimensions match (both 3). A (5, 1) array and a (1, 3) array are compatible (both have a 1 in different positions), producing a (5, 3) result. A (5, 3) array and a (4,) array are incompatible because 3 and 4 are not equal and neither is 1. When broadcasting fails, NumPy raises a ValueError with a clear message about incompatible shapes.
Practical broadcasting examples show the pattern's power. To normalize each column of a dataset to zero mean and unit variance: (data - data.mean(axis=0)) / data.std(axis=0) applies without loops because both mean and std return row vectors that broadcast across all rows. To compute a pairwise distance matrix between two sets of points: np.sqrt(((a[:, None, :] - b[None, :, :]) ** 2).sum(axis=2)) uses broadcasting to expand a into (N, 1, D) and b into (1, M, D), compute all NxM difference vectors, square, sum, and take the square root. This single expression replaces a double nested loop.
To compute an outer product of two vectors (every combination of multiplications), reshape one to a column vector and one to a row vector: a[:, None] * b[None, :] produces an (N, M) result. To apply a function to a grid of x and y values (common for plotting 3D surfaces or heatmaps): create x = np.linspace(-5, 5, 100)[:, None] and y = np.linspace(-5, 5, 100)[None, :], then z = np.sin(x) * np.cos(y) computes the function on the entire 100x100 grid without any loops. This broadcasting pattern is the standard way to evaluate 2D functions in NumPy.
Step 5: Perform Linear Algebra
Matrix multiplication uses the @ operator (Python 3.5+) or np.dot(). For 2D arrays, A @ B computes the matrix product. For 1D arrays, a @ b computes the dot product (scalar). The @ operator is preferred over np.dot() for readability. Solving the linear system Ax = b (find x given A and b) uses np.linalg.solve(A, b), which is faster and more numerically stable than computing the inverse: never compute A_inv = np.linalg.inv(A) followed by x = A_inv @ b when you can use solve() directly.
Matrix decompositions underpin much of scientific computing. np.linalg.eig(A) returns eigenvalues and eigenvectors. np.linalg.svd(A) computes the singular value decomposition, used in dimensionality reduction, data compression, and pseudoinverse computation. np.linalg.cholesky(A) computes the Cholesky decomposition of a positive-definite matrix, used in sampling multivariate normal distributions and solving structured linear systems. np.linalg.qr(A) computes the QR decomposition, used in least squares fitting.
Least squares fitting with np.linalg.lstsq(A, b, rcond=None) solves overdetermined systems (more equations than unknowns), returning the solution that minimizes the sum of squared residuals. This is the computational core of linear regression: construct a matrix A where each row contains the predictor values for one observation, set b to the response values, and lstsq returns the best-fit coefficients, residuals, rank, and singular values. For polynomial fitting, np.polyfit(x, y, degree) provides a simpler interface that constructs the Vandermonde matrix internally.
Sparse matrices, where most elements are zero, use scipy.sparse rather than NumPy dense arrays. A 10000x10000 matrix with only 50000 nonzero elements uses 800 MB as a dense NumPy array but only about 1.2 MB in sparse format. Sparse matrix types include CSR (compressed sparse row) for efficient row slicing and matrix-vector products, CSC (compressed sparse column) for efficient column slicing, and COO (coordinate) for constructing sparse matrices element by element. scipy.sparse.linalg provides sparse eigensolvers and linear system solvers that exploit the sparsity structure for orders-of-magnitude speedups over dense equivalents.
The core skill of NumPy programming is replacing Python loops with vectorized array operations. Every time you write a loop over array elements, ask whether a NumPy function or broadcasting expression can replace it.