Geometric discrepancy: An illustrated guide

Main results: EpsilonNetTheorem, ShatterFunctionLemma, PackingLemma, ShortEdgeLemma

Matoušek, Jiří

1

Geometric discrepancy: An illustrated guide dependency graph Nodes are catalog entries in this manuscript; solid edges are statement dependencies, dashed edges are proof dependencies.
statement dependency proof dependency

All Entries

2

Preliminaries

ShatterFunction Definition

Let \((X, \mathcal{S})\) be a set system, where \(\mathcal{S} \subseteq 2^X\). The primal shatter function \(\pi_{\mathcal{S}} \colon \mathbb{N} \to \mathbb{N}\) is defined by \[\pi_{\mathcal{S}}(m) = \max_{Y \subseteq X,\; |Y| = m} |\mathcal{S}|_Y|,\] where \(\mathcal{S}|_Y = \{ S \cap Y : S \in \mathcal{S} \}\) is the trace of \(\mathcal{S}\) on \(Y\).

Catalog/VC/ShatterFunction/Contract.lean
1import Mathlib.Combinatorics.SetFamily.Shatter
2
3namespace Catalog.VC.ShatterFunction
4
5variable {α : Type*} [DecidableEq α]
6
7/-- The trace (restriction) of a set family `𝒜` on a subset `Y`:
8 the family `{ S ∩ Y | S ∈ 𝒜 }` of distinct intersections. -/
9def trace (𝒜 : Finset (Finset α)) (Y : Finset α) : Finset (Finset α) :=
10 𝒜.image (· ∩ Y)
11
12/-- The primal shatter function `π_𝒜(m)`: the maximum number of distinct
13 traces of `𝒜` on any `m`-element subset of the ground set. (Def 5.1) -/
14def shatterFun [Fintype α] (𝒜 : Finset (Finset α)) (m : ℕ) : ℕ :=
15 (Finset.univ.filter (fun Y : Finset α => Y.card = m)).sup
16 (fun Y => (trace 𝒜 Y).card)
17
18end Catalog.VC.ShatterFunction
19
VCDimension Definition

Let \((X, \mathcal{S})\) be a set system. A subset \(A \subseteq X\) is shattered by \(\mathcal{S}\) if \(\mathcal{S}|_A = 2^A\), that is, every subset of \(A\) is of the form \(S \cap A\) for some \(S \in \mathcal{S}\).

The VC-dimension of \(\mathcal{S}\), denoted \(\dim(\mathcal{S})\), is the supremum of the sizes of all finite shattered subsets of \(X\). Equivalently, \[\dim(\mathcal{S}) = \sup\{ m : \pi_{\mathcal{S}}(m) = 2^m \}.\]

Catalog/VC/VCDimension/Contract.lean
1import Mathlib.Combinatorics.SetFamily.Shatter
2
3namespace Catalog.VC.VCDimension
4
5/-- A finite set `s` is shattered by family `𝒜` if every subset of `s`
6 is realized as `s ∩ S` for some `S ∈ 𝒜`.
7 Wraps Mathlib's `Finset.Shatters`. (Def 5.8) -/
8abbrev Shatters {α : Type*} [DecidableEq α]
9 (𝒜 : Finset (Finset α)) (s : Finset α) : Prop :=
10 𝒜.Shatters s
11
12/-- The VC-dimension of `𝒜`: the maximum cardinality of a set shattered
13 by `𝒜`. Equivalently, `sup { m : π_𝒜(m) = 2^m }`.
14 Wraps Mathlib's `Finset.vcDim`. (Def 5.8) -/
15abbrev vcDim {α : Type*} [DecidableEq α]
16 (𝒜 : Finset (Finset α)) : ℕ :=
17 𝒜.vcDim
18
19end Catalog.VC.VCDimension
20

EpsilonNetTheorem

EpsilonNet Definition

Let \((X, \mathcal{S})\) be a set system and \(\mu\) a probability measure on \(X\). A subset \(N \subseteq X\) is an \(\varepsilon\)-net for \((X, \mathcal{S})\) with respect to \(\mu\) if \(N \cap S \ne \emptyset\) for every \(S \in \mathcal{S}\) with \(\mu(S) \ge \varepsilon\).

When \(X\) is finite and \(\mu\) is the counting measure (\(\mu(A) = |A|/|X|\)), this says that \(N\) intersects every set \(S \in \mathcal{S}\) with \(|S| \ge \varepsilon |X|\).

Catalog/VC/EpsilonNet/Contract.lean
1import Mathlib.Combinatorics.SetFamily.Shatter
2import Mathlib.Analysis.SpecialFunctions.Log.Basic
3
4namespace Catalog.VC.EpsilonNet
5
6variable {α : Type*} [DecidableEq α] [Fintype α]
7
8/-- `N` is an `ε`-net for set family `𝒜` under the uniform distribution
9 on the finite ground set `α`. Every `S ∈ 𝒜` whose density
10 `|S| / |α| ≥ ε` satisfies `N ∩ S ≠ ∅`. -/
11def IsEpsilonNet (𝒜 : Finset (Finset α)) (ε : ℝ) (N : Finset α) : Prop :=
12 ∀ S ∈ 𝒜, ε * (Fintype.card α : ℝ) ≤ (S.card : ℝ) → (N ∩ S).Nonempty
13
14end Catalog.VC.EpsilonNet
15
EpsilonNetTheorem Theorem

For every \(d \ge 1\) there exists a constant \(C(d)\) such that the following holds. If \(X\) is a nonempty set with a probability measure \(\mu\), \(\mathcal{S}\) is a system of \(\mu\)-measurable subsets of \(X\) with \(\dim(\mathcal{S}) \le d\) (Definition VCDimension), and \(r \ge 2\), then there exists a \(\frac{1}{r}\)-net (Definition EpsilonNet) for \((X, \mathcal{S})\) with respect to \(\mu\) of size at most \(C(d)\, r \ln r\).

Catalog/VC/EpsilonNetTheorem/Contract.lean
1import Catalog.VC.VCDimension.Contract
2import Catalog.VC.EpsilonNet.Contract
3import Mathlib.Analysis.SpecialFunctions.Log.Basic
4
5namespace Catalog.VC.EpsilonNetTheorem
6
7/-- For every VC-dimension bound `d`, there exists a constant `C` such that
8 any set family with VC-dimension at most `d` admits a `(1/r)`-net of size
9 at most `C · r · ln r`, for all `r ≥ 2`. (Theorem 5.12) -/
10axiom epsilonNetTheorem (d : ℕ) : ∃ C : ℝ, 0 < C ∧
11 ∀ (α : Type*) [DecidableEq α] [Fintype α] [Nonempty α] (𝒜 : Finset (Finset α)),
12 𝒜.vcDim ≤ d →
13 ∀ (r : ℝ), 2 ≤ r →
14 ∃ N : Finset α, Catalog.VC.EpsilonNet.IsEpsilonNet 𝒜 (1 / r) N ∧
15 (N.card : ℝ) ≤ C * r * Real.log r
16
17end Catalog.VC.EpsilonNetTheorem
18

ShatterFunctionLemma

ShatterFunctionLemma Lemma

For any set system \(\mathcal{S}\) of VC-dimension at most \(d\) (Definition VCDimension), the primal shatter function (Definition ShatterFunction) satisfies \[\pi_{\mathcal{S}}(m) \le \Phi_d(m) \quad \text{for all } m,\] where \[\Phi_d(m) = \binom{m}{0} + \binom{m}{1} + \cdots + \binom{m}{d}.\]

Catalog/VC/ShatterFunctionLemma/Contract.lean
1import Catalog.VC.ShatterFunction.Contract
2import Catalog.VC.VCDimension.Contract
3
4namespace Catalog.VC.ShatterFunctionLemma
5
6/-- The binomial sum `Φ_d(m) = ∑_{k=0}^{d} C(m, k)`. Upper bound for the
7 shatter function of a family with VC-dimension at most `d`. -/
8def binomialSum (d m : ℕ) : ℕ :=
9 (Finset.range (d + 1)).sum (fun k => m.choose k)
10
11/-- The shatter function of a family with VC-dimension at most `d` is bounded
12 by the binomial sum: `π_𝒜(m) ≤ Φ_d(m)` for all `m`. (Lemma 5.9) -/
13axiom shatterFunctionLemma {α : Type*} [DecidableEq α] [Fintype α]
14 (𝒜 : Finset (Finset α)) {d : ℕ} (hd : 𝒜.vcDim ≤ d) (m : ℕ) :
15 Catalog.VC.ShatterFunction.shatterFun 𝒜 m ≤ binomialSum d m
16
17end Catalog.VC.ShatterFunctionLemma
18

PackingLemma

PackingLemma Lemma

Let \(d > 1\) and \(C\) be constants, and let \((X, \mathcal{S})\) be a set system on an \(n\)-point set whose primal shatter function (Definition ShatterFunction) satisfies \(\pi_{\mathcal{S}}(m) \le C m^d\) for all \(m = 1, 2, \ldots, n\). Let \(\delta\) be an integer, \(1 \le \delta \le n\), and let \(\mathcal{P} \subseteq \mathcal{S}\) be \(\delta\)-separated: any two distinct sets of \(\mathcal{P}\) satisfy \(|S_1 \mathbin{\triangle} S_2| > \delta\). Then \[|\mathcal{P}| = O\!\left(\left(\frac{n}{\delta}\right)^{\!d}\right).\]

Catalog/VC/PackingLemma/Contract.lean
1import Catalog.VC.ShatterFunction.Contract
2import Mathlib.Analysis.SpecialFunctions.Pow.Real
3
4namespace Catalog.VC.PackingLemma
5
6variable {α : Type*} [DecidableEq α]
7
8/-- A subfamily `𝒫` is `δ`-separated if any two distinct members have
9 symmetric difference of size strictly greater than `δ`. -/
10def IsSeparated (P : Finset (Finset α)) (δ : ℕ) : Prop :=
11 ∀ S ∈ P, ∀ S' ∈ P, S ≠ S' → δ < (S \ S' ∪ (S' \ S)).card
12
13/-- Packing lemma: if the shatter function satisfies `π_𝒜(m) ≤ C · m^d`
14 and `𝒫 ⊆ 𝒜` is `δ`-separated, then `|𝒫| ≤ C' · (n/δ)^d`.
15 The constant `C'` depends only on `C` and `d`. (Lemma 5.14) -/
16axiom packingLemma (d : ℕ) (C : ℝ) (hC : 0 < C) : ∃ C' : ℝ, 0 < C' ∧
17 ∀ (α : Type*) [DecidableEq α] [Fintype α] (F P : Finset (Finset α)),
18 P ⊆ F →
19 (∀ m : ℕ, (Catalog.VC.ShatterFunction.shatterFun F m : ℝ) ≤ C * (m : ℝ) ^ d) →
20 ∀ (δ : ℕ), 0 < δ → δ ≤ Fintype.card α →
21 IsSeparated P δ →
22 (P.card : ℝ) ≤ C' * ((Fintype.card α : ℝ) / δ) ^ d
23
24end Catalog.VC.PackingLemma
25

ShortEdgeLemma

DualShatterFunction Definition

Let \((X, \mathcal{S})\) be a set system. The dual shatter function \(\pi_{\mathcal{S}}^* \colon \mathbb{N} \to \mathbb{N}\) is defined by \[\pi_{\mathcal{S}}^*(m) = \max_{\mathcal{Y} \subseteq \mathcal{S},\; |\mathcal{Y}| = m} \bigl|\bigl\{ \{S \in \mathcal{Y} : x \in S\} : x \in X \bigr\}\bigr|.\] Equivalently, \(\pi_{\mathcal{S}}^*(m)\) is the maximum number of nonempty cells in the Venn diagram of \(m\) sets chosen from \(\mathcal{S}\).

Catalog/VC/DualShatterFunction/Contract.lean
1import Mathlib.Combinatorics.SetFamily.Shatter
2
3namespace Catalog.VC.DualShatterFunction
4
5variable {α : Type*} [DecidableEq α]
6
7/-- The atom map of a subfamily `𝒴` at point `x`: the set of members
8 of `𝒴` that contain `x`. Two points `x, y` have the same atom map
9 iff they belong to exactly the same sets of `𝒴`. -/
10def atomMap (𝒴 : Finset (Finset α)) (x : α) : Finset (Finset α) :=
11 𝒴.filter (x ∈ ·)
12
13/-- The dual shatter function `π*_𝒜(m)`: the maximum number of distinct
14 atom types induced by any `m`-element subfamily of `𝒜`.
15 Equivalently, the maximum number of nonempty cells in the Venn diagram
16 of `m` sets chosen from `𝒜`. (Def 5.2) -/
17def dualShatterFun [Fintype α] (𝒜 : Finset (Finset α)) (m : ℕ) : ℕ :=
18 (𝒜.powerset.filter (fun 𝒴 => 𝒴.card = m)).sup
19 (fun 𝒴 => (Finset.univ.image (atomMap 𝒴)).card)
20
21end Catalog.VC.DualShatterFunction
22
ShortEdgeLemma Lemma

Let \((X, \mathcal{S})\) be a set system on an \(n\)-point set \(X\) with dual shatter function (Definition DualShatterFunction) \(\pi_{\mathcal{S}}^*(m) \le C m^d\) for all \(m\), where \(C\) and \(d > 1\) are constants. A set \(S \in \mathcal{S}\) crosses an edge \(\{u,v\}\) of a graph on \(X\) if \(|S \cap \{u,v\}| = 1\).

Then for any multiset \(Q\) of sets from \(\mathcal{S}\), there exist distinct points \(x, y \in X\) such that the number of sets of \(Q\) crossing \(\{x, y\}\) is at most \[C_2 \, \frac{|Q|}{n^{1/d}},\] where \(C_2\) depends only on \(C\) and \(d\).

Catalog/VC/ShortEdgeLemma/Contract.lean
1import Catalog.VC.DualShatterFunction.Contract
2import Mathlib.Analysis.SpecialFunctions.Pow.Real
3
4namespace Catalog.VC.ShortEdgeLemma
5
6universe u
7
8variable {α : Type*} [DecidableEq α]
9
10/-- The crossing number of an edge `{x, y}` by a subfamily `Q`: the number
11 of sets in `Q` that contain exactly one of `x, y`. -/
12abbrev crossingNumber (Q : Finset (Finset α)) (x y : α) : ℕ :=
13 (Q.filter (fun S => (x ∈ S ∧ y ∉ S) ∨ (x ∉ S ∧ y ∈ S))).card
14
15/-- Short edge lemma: if the dual shatter function satisfies
16 `π*_𝒜(m) ≤ C · m^d` with `d > 1`, then for any subfamily `Q ⊆ 𝒜`
17 there exist distinct points `x, y` with crossing number at most
18 `C₂ · |Q| / n^{1/d}`. (Lemma 5.18) -/
19axiom shortEdgeLemma (d : ℕ) (C : ℝ) (hd : 1 < d) (hC : 0 < C) :
20 ∃ C₂ : ℝ, 0 < C₂ ∧
21 ∀ (α : Type u) [DecidableEq α] [Fintype α] (𝒜 : Finset (Finset α)),
22 (∀ m : ℕ, (Catalog.VC.DualShatterFunction.dualShatterFun 𝒜 m : ℝ) ≤ C * (m : ℝ) ^ d) →
23 2 ≤ Fintype.card α →
24 ∀ (Q : Finset (Finset α)), Q ⊆ 𝒜 →
25 ∃ x y : α, x ≠ y ∧
26 ((Q.filter (fun S => (x ∈ S ∧ y ∉ S) ∨ (x ∉ S ∧ y ∈ S))).card : ℝ) ≤
27 C₂ * (Q.card : ℝ) /
28 (Fintype.card α : ℝ) ^ ((1 : ℝ) / d)
29
30end Catalog.VC.ShortEdgeLemma
31