Radial Basis Function (RBF) Kernel

Introduction

In machine learning, the Radial Basis Function (RBF) kernel is a popular kernel function commonly used in support vector machines and other kernelized learning algorithms. It effectively measures similarity between samples in a high-dimensional space.

Definition

The RBF kernel between two samples \(\mathbf{x}\) and \(\mathbf{x'}\) is defined as:

\[K(\mathbf{x}, \mathbf{x'}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{x'}\|^2}{2\sigma^2}\right)\]

where:

  • \(\|\mathbf{x} - \mathbf{x'}\|^2\) is the squared Euclidean distance.
  • \(\sigma\) is a parameter that controls the kernel’s spread or smoothness.

Alternatively, using a parameter \(\gamma = \frac{1}{2\sigma^2}\):

\[K(\mathbf{x}, \mathbf{x'}) = \exp(-\gamma \|\mathbf{x} - \mathbf{x'}\|^2)\]

The RBF kernel outputs values between 0 and 1, where 1 implies identical vectors and values closer to 0 indicate greater dissimilarity.

Expansion Using Multinomial Theorem

Since the value of the RBF kernel decreases with distance and ranges between zero (at infinite distance) and one (when \(\mathbf{x} = \mathbf{x'}\)), it can be interpreted as a similarity measure. The feature space of the RBF kernel has an infinite number of dimensions; for \(\sigma = 1\), the expansion using the multinomial theorem is as follows:

\[\exp\left(-\frac{1}{2} \|\mathbf{x} - \mathbf{x'}\|^2\right) = \exp\left(\mathbf{x}^T \mathbf{x'} - \frac{1}{2}\|\mathbf{x}\|^2 - \frac{1}{2}\|\mathbf{x'}\|^2\right)\]

Expanding this further:

\[= \sum_{j=0}^{\infty} \frac{(\mathbf{x}^T \mathbf{x'})^j}{j!} \exp\left(-\frac{1}{2}\|\mathbf{x}\|^2\right) \exp\left(-\frac{1}{2}\|\mathbf{x'}\|^2\right)\]

This expansion allows us to interpret the RBF kernel in terms of an infinite-dimensional feature space, where each term corresponds to higher-order interactions between the features of \(\mathbf{x}\) and \(\mathbf{x'}\).

Approximations

For large datasets or high-dimensional input spaces, RBF kernel approximations can reduce computational load. Common approximation techniques include Fourier random features and the Nyström method.

Fourier Random Features

One way to construct an approximation is by sampling from the Fourier transformation of the kernel:

\[\varphi(\mathbf{x}) = \frac{1}{\sqrt{D}} \left[\cos(w_1, x), \sin(w_1, x), \ldots, \cos(w_D, x), \sin(w_D, x)\right]^T\]

where \(w_1, \ldots, w_D\) are independent samples from a normal distribution \(N(0, \sigma^{-2} I)\).

This approximation enables the computation of the RBF kernel using a finite-dimensional feature mapping, reducing the computational complexity for large datasets.

Nyström Method

Another approach uses the Nyström method to approximate the eigendecomposition of the Gram matrix \(K\), using only a random sample of the training set.

References