Signal Processing in Python

Updated May 2026
Signal processing in Python uses scipy.signal for filter design and application, numpy.fft for Fourier transforms, and scipy.signal for spectral analysis, peak detection, and time-frequency representations. These tools handle the measured data that comes from sensors, instruments, and recording equipment across every experimental science: EEG and ECG recordings in neuroscience and cardiology, seismic waveforms in geophysics, acoustic signals in environmental monitoring, voltage traces in electronics, and any other quantity sampled over time.

Signal processing transforms raw measured data into interpretable information. A raw EEG recording contains brain signals mixed with muscle artifacts, line noise, and electrode drift. A raw seismic trace contains earthquake signals buried in ocean noise, traffic vibrations, and instrument response. Signal processing separates the signal of interest from everything else through filtering (removing unwanted frequency components), spectral analysis (identifying which frequencies are present), feature detection (finding peaks, onsets, and events), and time-frequency analysis (tracking how frequency content evolves). Python's scipy.signal module provides all of these capabilities through functions that wrap decades of DSP algorithm development.

Step 1: Load and Prepare Signal Data

Signal data comes from instruments as time series: a sequence of measurements taken at regular intervals. The sampling rate (fs, in Hz or samples per second) determines the time resolution and the maximum frequency that can be represented (the Nyquist frequency, fs/2). Load data from CSV: data = pd.read_csv('recording.csv'). From binary formats: data = np.fromfile('data.bin', dtype=np.float32). From WAV audio: from scipy.io import wavfile, fs, data = wavfile.read('recording.wav'). From domain-specific formats, use specialized libraries: MNE-Python for EEG/MEG, ObsPy for seismology, soundfile for general audio.

Create a time axis from the sampling rate: t = np.arange(len(data)) / fs. This gives time in seconds for each sample. For data with timestamps, compute the effective sampling rate: fs = 1 / np.median(np.diff(timestamps)). Irregular sampling requires resampling to a regular grid before most signal processing operations: from scipy.interpolate import interp1d, f = interp1d(timestamps, values), regular_times = np.arange(timestamps[0], timestamps[-1], 1/desired_fs), regular_values = f(regular_times).

Initial inspection reveals signal characteristics that guide processing choices. Plot the raw signal: plt.plot(t, data). Plot the amplitude spectrum: freqs = np.fft.rfftfreq(len(data), 1/fs), spectrum = np.abs(np.fft.rfft(data)), plt.semilogy(freqs, spectrum). The time-domain plot shows the overall shape, amplitude range, obvious artifacts, and baseline drift. The frequency spectrum shows the dominant frequencies, noise floor, and any narrowband interference (like 50 or 60 Hz power line noise appearing as a sharp spike). These plots determine which filters and processing steps are needed.

Step 2: Apply Digital Filters

Butterworth filters provide maximally flat passband response (no ripple in the frequency range you keep). from scipy.signal import butter, sosfilt. Design a low-pass filter: sos = butter(4, 50, btype='low', fs=1000, output='sos') creates a 4th-order filter with 50 Hz cutoff for data sampled at 1000 Hz. Apply it: filtered = sosfilt(sos, data). The output='sos' format (second-order sections) is numerically stable and should always be preferred over the older transfer function format (output='ba') which suffers from numerical instability for higher-order filters.

Filter types serve different purposes. Low-pass filters remove high-frequency noise while preserving slow trends and low-frequency signals: sos = butter(4, cutoff_hz, 'low', fs=fs, output='sos'). High-pass filters remove baseline drift and slow artifacts while preserving faster signals: sos = butter(4, cutoff_hz, 'high', fs=fs, output='sos'). Bandpass filters isolate a specific frequency range: sos = butter(4, [low_hz, high_hz], 'band', fs=fs, output='sos'). Notch (bandstop) filters remove a narrow band, typically power line interference: from scipy.signal import iirnotch, b, a = iirnotch(60, 30, fs) removes 60 Hz noise with a quality factor of 30.

Zero-phase filtering eliminates the time delay that standard filtering introduces. sosfiltfilt(sos, data) applies the filter forward and backward, canceling the phase shift and producing output that is perfectly aligned in time with the input. Use sosfiltfilt for any analysis where timing matters (event detection, latency measurements, peak timing). Use sosfilt for real-time or causal processing where you cannot look ahead in the data. The difference is subtle but critical: sosfiltfilt shifts no features in time, while sosfilt delays all features by an amount that depends on their frequency.

Filter order controls the sharpness of the transition between passband and stopband. A 2nd-order Butterworth rolls off at 12 dB/octave, a 4th-order at 24 dB/octave, an 8th-order at 48 dB/octave. Higher orders provide sharper cutoffs but introduce more ringing (overshoot near sharp transitions in the signal) and require more computation. For most scientific applications, 4th-order Butterworth filters (24 dB/octave rolloff) provide a good balance between selectivity and artifact avoidance. If you need a very sharp cutoff without ringing, consider a Chebyshev Type II or elliptic filter, which sacrifice passband flatness for steeper rolloff.

Step 3: Analyze Frequency Content

The Fast Fourier Transform (FFT) decomposes a signal into its frequency components. spectrum = np.fft.rfft(data) computes the one-sided spectrum for real-valued signals. freqs = np.fft.rfftfreq(len(data), d=1/fs) gives the corresponding frequencies in Hz. magnitude = np.abs(spectrum) / len(data) gives the amplitude at each frequency. phase = np.angle(spectrum) gives the phase. Plot the magnitude spectrum: plt.plot(freqs, magnitude) or plt.semilogy(freqs, magnitude) for signals with a wide dynamic range. The FFT assumes the signal is periodic over the analysis window, so apply a window function (np.hanning, np.hamming, np.blackman) before FFT to reduce spectral leakage from non-periodic signals.

Power Spectral Density (PSD) estimation provides a smoother, statistically more reliable estimate of frequency content than a single FFT. from scipy.signal import welch. freqs, psd = welch(data, fs=fs, nperseg=1024). Welch's method divides the signal into overlapping segments, computes the FFT of each, and averages the magnitude spectra. nperseg controls the trade-off: larger segments give finer frequency resolution but noisier estimates, smaller segments give smoother estimates but coarser frequency resolution. Plot PSD in dB: plt.semilogy(freqs, psd) or 10 * np.log10(psd) for decibel scale.

Cross-spectral analysis examines the relationship between two signals at each frequency. from scipy.signal import coherence. freqs, coh = coherence(signal1, signal2, fs=fs, nperseg=1024). Coherence ranges from 0 (no linear relationship at that frequency) to 1 (perfect linear relationship). High coherence at a specific frequency means the two signals share a common source or are causally related at that frequency. Cross-spectral phase gives the time delay between the two signals at each frequency. These tools are essential in neuroscience (coherence between brain regions), geophysics (coherence between seismic stations), and engineering (input-output analysis of systems).

Step 4: Detect Features and Events

Peak detection identifies local maxima in signals. from scipy.signal import find_peaks. peaks, properties = find_peaks(data, height=threshold, distance=min_samples_between_peaks, prominence=min_prominence, width=min_width). peaks is an array of indices where peaks occur. properties contains the height, prominence, and width of each peak. Prominence is the most useful parameter: it measures how much a peak stands out from its surrounding baseline, rejecting small fluctuations on top of larger features while detecting genuine peaks even if they are not the tallest points in the signal.

Envelope detection extracts the slowly varying amplitude of an oscillatory signal. The analytic signal method: from scipy.signal import hilbert, analytic = hilbert(data), envelope = np.abs(analytic). The envelope follows the peaks of the oscillation without the rapid oscillations themselves. This is essential for amplitude modulation analysis, detecting the onset and offset of oscillatory bursts (like sleep spindles in EEG or tremor episodes in accelerometer data), and measuring how amplitude varies over time.

Event detection finds the times when something happens in the signal: the onset of a stimulus response, the arrival of a seismic wave, the start of a heartbeat. Threshold crossing: events = np.where(np.diff(np.sign(data - threshold)))[0] finds where the signal crosses the threshold. For more robust detection, combine filtering (to reduce false triggers from noise), threshold crossing, and minimum interval requirements (to prevent multiple detections of the same event). Template matching using cross-correlation (np.correlate(data, template, mode='same')) finds occurrences of a known waveform shape within a longer recording.

Signal statistics characterize the overall properties of a signal or signal segment. RMS (root mean square) amplitude: rms = np.sqrt(np.mean(data**2)). Crest factor (peak-to-RMS ratio): crest = np.max(np.abs(data)) / rms. Zero-crossing rate: zcr = np.sum(np.diff(np.sign(data)) != 0) / len(data). Kurtosis (peakedness): from scipy.stats import kurtosis, k = kurtosis(data). These statistics computed over sliding windows (rolling statistics) reveal how signal characteristics change over time, useful for classifying signal segments (speech vs silence, seizure vs normal EEG, earthquake vs background noise).

Step 5: Apply Time-Frequency Analysis

Spectrograms show how the frequency content of a signal changes over time. from scipy.signal import spectrogram. freqs, times, Sxx = spectrogram(data, fs=fs, nperseg=256, noverlap=128). plt.pcolormesh(times, freqs, 10*np.log10(Sxx), cmap='viridis', shading='gouraud'). The spectrogram is a 2D image where the x-axis is time, the y-axis is frequency, and the color represents power. nperseg controls the time-frequency resolution trade-off: longer segments give better frequency resolution but poorer time resolution, shorter segments give better time resolution but poorer frequency resolution. This fundamental uncertainty (analogous to the Heisenberg uncertainty principle) cannot be avoided, only managed.

The Short-Time Fourier Transform (STFT) provides more control than the spectrogram function. from scipy.signal import stft. freqs, times, Zxx = stft(data, fs=fs, nperseg=256, noverlap=192). Zxx is complex-valued, giving both magnitude and phase at each time-frequency point. The inverse STFT (istft) reconstructs the signal from the time-frequency representation, enabling time-frequency filtering: modify Zxx (zero out unwanted time-frequency regions), then reconstruct with istft to get a signal with those components removed. This is more flexible than standard filtering because you can remove interference that is localized in both time and frequency.

Wavelet analysis provides multi-resolution time-frequency decomposition with better time resolution at high frequencies and better frequency resolution at low frequencies. from scipy.signal import cwt, morlet2. widths = np.arange(1, 128). cwt_matrix = cwt(data, morlet2, widths). The continuous wavelet transform (CWT) convolves the signal with scaled and shifted versions of a mother wavelet. For discrete wavelet transforms (DWT) and wavelet denoising, use PyWavelets: import pywt, coeffs = pywt.wavedec(data, 'db4', level=5). Wavelet denoising thresholds the detail coefficients and reconstructs: threshold the small coefficients to zero, keep the large ones, then pywt.waverec(modified_coeffs, 'db4').

Choosing between spectrograms and wavelets depends on the signal characteristics. Spectrograms use fixed time-frequency resolution at all frequencies, appropriate when the signals of interest span a narrow frequency range. Wavelets use variable resolution (fine time resolution at high frequencies, fine frequency resolution at low frequencies), matching how many natural signals behave: rapid transients at high frequencies, slow modulations at low frequencies. For speech, music, and biomedical signals, wavelets often provide a more informative decomposition. For narrowband signals (communications, radar), spectrograms are usually preferred.

Key Takeaway

Always use sosfiltfilt for zero-phase filtering when timing accuracy matters, always window your data before computing FFTs, and always use Welch's method rather than a single FFT for power spectral density estimation.