| Measure | Category(ies) | Codename | Integer | Notes |
|---|---|---|---|---|
| L1 distance | geometric metric | L1 | 0 | sum(|x - |y) |
| L2 distance | geometric metric | L2 | 1 | sqrt(sum((x - y)^2)) |
| Total variation distance | geometric metric | TVD | 9 | Useful information theoretic measure; 1/2 L1 distance in probability space |
| squared L2 distance | geometric rho-metric | SQRL2 | 2 | sum((x - y)^2) |
| Jensen-Shannon Metric | metric | JSM | 3 | Metric, but no closed-form solution for centroids. Perhaps best-suited for linkage/density-based clustering; sqrt(JSM) |
| Jensen-Shannon Divergence | pseudo-metric | JSD | 4 | Symmetric but no data-indepedent triangle inequality; convex combination of Bregman divergences |
| Itakura-Saito Divergence | Bregman divergence | IS | 13 | Corresponds to the exponential distribution; sum((x / y) - log(x / y) - 1) |
| Kullback-Leibler Divergence | Bregman divergence | MKL | 5 | Corresponds to the multinomial distribution with fixed n, or the categorical distribution; sum(x log(x / y)) |
| Reversed Itakura-Saito Divergence | Bregman divergence | REVERSE_ITAKURA_SAITO |
14 | Corresponds to the multinomial distribution with fixed n, or the categorical distribution; sum(y / x) - log(y / x) - 1 |
| Symmetric Itakura-Saito Divergence | Bregman divergence | SIS | 23 | Corresponds to the multinomial distribution with fixed n, or the categorical distribution |
| Hellinger Distance | metric | HELLINGER | 6 | Information-theoretic measure with useful upper/lower bounds for TVD and JSD; sqrt(sum((sqrt(x) - sqrt(y))^2)) |
| Bhattacharyya Metric | metric | BHATTACHARYYA_METRIC |
7 | Non-metric, but useful measure of dissimilarity. Related to the metric.; sqrt(1 - sum(sqrt(x) * sqrt(y))) |
| Bhattacharyya Distance | dissimilarity measure | BHATTACHARYYA_DISTANCE |
8 | Non-metric, but useful measure of dissimilarity. Related to the metric; -log(sum(sqrt(x) * sqrt(y))) |
| LLR | dissimilarity measure | LLR |
10 | non-metric, but can be processed as a data-dependent rho-metric. As the prior rises, the |
| UWLLR | dissimilarity measure | UWLLR |
12 | non-metric, but can be processed as a data-dependent rho-metric. As the prior rises, the |
Of the above, there are strict metrics: L1, L2, JSM, Hellinger, Bhattacharyya, and TVD
Some are constant-factor
Some are unbounded in theory but practically well-behaved
SIS, IS, KL, JSD, JSM, UWLLR, LLR
Jensen-Shannon, Itakura-Saito, and KL divergence are all Bregman divergences (and convex combinations thereof) Itakura-Saito corresponds to the exponential distribution, while Jensen-Shannon corresponds to the multinomial. Jensen-Shannon is the symmetrized KL divergence; the corresponding symmetrization for Itakura-Saito is available as SIS.
Squared L2 distance itself is also a Bregman divergence; it corresponds to the spherical Gaussian.
Another question is whether or not a measure can be optimized over via EM.
| Measure | Distance | Centroid method |
|---|---|---|
| L1 | L1 distance | component-wise median |
| L2 | L2 distance | geometric median via Weiszfeld's algorithm |
| SQRL2 | squared L2 distance; a Bregman divergence | Weighted mean of points |
| Itakura-Saito | IS distance, a Bregman divergence | Weighted mean of points |
| Reverse Itakura-Saito | RIS distance, a Bregman divergence | Weighted mean of points |
| Symmetrized Itakura-Saito | SIS distance, a convex combination of Bregman divergences | Weighted mean of points |
| Reverse Symmetrized Itakura-Saito | RSIS distance, a convex combination of Bregman divergences | Weighted mean of points |
| KL Divergence | KL divergence, a Bregman divergence | Weighted mean of points |
| Reverse KL | RMKL distance, a Bregman divergence | Weighted mean of points |
| JSD | JSD, a convex combination of KL divergences | Weighted mean of points |
| UWLLR | UWLLR, a symmetric, linear combination of KL divergences | Weighted mean of points |
| LLR | LLR, linear combination of KL divergences | Weighted mean of points |
| Total variation distance | 1/2 L1 distance in probability space | component-wise median |
| SRLRT | square root of the log-likelihood ratio test; may be a metric | unknown, using weighted mean |
| SRULRT | square root of the unweighted log-likelihood ratio test (UWLLR); may be a metric | unknown, using weighted mean |
Measures for which we would like such an algorithm but don't have include:
JSM, the square root of the JSD Bhattacharyya distance Bhattacharyya metric Hellinger distance
L1 nearest-neighbors can be found with Cauchy sampling and p-stable distributions. L2 uses Gaussian sampling + p-stable distributions.
Hellinger, JSD, LLR, and UWLLR nearest neighbors can be found using Hellinger/JSD LSH function, which is itself a reduction to the L2 nearest neighbors for the square roots of the probability vectors. Because of the relations between TVD, JSD, and Hellinger, techniques that work for one often work for another.
http://eduardovalle.com/wordpress/wp-content/uploads/2014/10/silva14sisapLargeScaleMetricLSH.pdf provides techniques for non-metrics; additionally, their use of kmeans++ for nearest neighbor lookup comparisons suggests that there may be a way to use importance sampling via D2 sampling as a hack around a proper LSH table.
Measures not supporting EM (JSM, Bhattacharyya metric, or Hellinger) can still be clustered using density-based or linkage-based methods.
Maintaining an efficient, accurate nearest-neighbor structure enables cheap large-scale clustering for most of these.