3 Injectivity
The goal of this chapter is to prove that \(\boldsymbol {\Psi }\) is an injective function on \(\mathbb {I}^n\) for some \(\lambda {\gt} 0\).
Let \(\boldsymbol {\Psi }\left(\mathbf{x}\right) \equiv \begin{bmatrix} \Psi _0\left(\mathbf{x} - 0\right) \\ \Psi _1\left(\mathbf{x} - \frac{1}{2n}\right) \\ \vdots \\ \Psi _q\left(\mathbf{x} - \frac{q}{2n}\right) \\ \vdots \\ \Psi _{2n}\left(\mathbf{x} - 1\right) \end{bmatrix}\) be a column vector of size \(2n + 1\), where \(\mathbf{x}^\top = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_p \\ \vdots \\ x_{n - 1} \end{bmatrix}\) is a column vector of size \(n\) that holds the arguments to \(f\), and \(\Psi _q\left(\mathbf{x} - \frac{q}{2n}\right) \equiv \sum \limits _{p = 0}^{n - 1} \frac{\lambda ^p}{\Lambda } \varphi \left(x_p - \frac{q}{2n}\right)\) in our construction from Chapter 2.
Each contour line is a level set, albeit at a different value of \(x\), where \(\Psi _q\left(\begin{bmatrix} x_1 & x_2 \end{bmatrix}\right) = \varphi \left(x - \frac{q}{2n}\right)\).
When \(n = 2\), we can illustrate the issue as in Figure 3.1. For each of the five values of \(q\), contour lines are plotted that depict pairs of values for \(x_1\) and \(x_2\) where \(\Psi _q\left(\begin{bmatrix} x_1 & x_2 \end{bmatrix}\right) = \varphi \left(x - \frac{q}{2n}\right)\) for various values of \(x\). Sprecher’s mechanism of shifting \(x_p\) by \(-\frac{q}{2n}\) ensures that these contour lines are largely parallel to each other. Pairs of contour lines can and do intersect, but the KST cannot hold if all \(2n + 1\) functions intersect at the same \(\mathbf{x}\). There are an infinite number of level sets, each corresponding to a different value of \(x\), so it is impossible to determine visually whether this condition is met. However, at the top right of the last plot, these contour lines are compressed into a small region, so the exact nature of the curvature there is important.
The contour lines are downward sloping and cross the diagonal, which justifies thinking of a level set as all possible vectors \(\mathbf{x}\) that render \(\Psi _q\left(\mathbf{x}\right)\) equal to \(\varphi \left(x - \frac{q}{2n}\right)\), which is the value that \(\Psi _q\) would take if all its arguments were equal to \(x\). Sprecher
What this says about the superposition representation of an arbitrary continuous function \(f\left(x_1, x_2\right)\) is that it is determined at five points on the diagonal through its representation …The first thing to notice is the relation between the level sets of superpositions and target functions. An arbitrary function \(f\left(x_1, x_2\right)\) may have infinitely many values on one or or more level sets …But on these the functions \(\left[\Psi _q\right]\) can only compute one value of \(f\). [78 - 79]
3.1 Linear Algebra
Let \(\mathbf{J}\left(\mathbf{x}\right)\) be a matrix with \(2n + 1\) rows and \(n\) columns such that each element is \(J_{q,p} \equiv \lambda _p \dot{\varphi }_{\infty }\left(x_p - \frac{q}{2n}\right) = \frac{\lambda ^p}{2\Lambda } \sum \limits _{m = 0}^\infty 4^{-m} U_{-1 + 2^m}\left(x_p - \frac{q}{2n}\right)\) per Lemma 2.4.5.
\(\mathbf{J}\left(\mathbf{x}\right)\) in Definition 9 can be evaluated at a point in \(\mathbb {I}^n\) even when \(\frac{\partial \Psi _q}{\partial x_p}\) is not defined, which occurs either when both \(x_p = 1\) and \(q = 0\) or when both \(x_p = 0\) and \(q = 2n\). Otherwise, \(\mathbf{J}\left(\mathbf{x}\right)\) is a traditional Jacobian matrix because the weak derivative coincides with the derivative.
\(\mathbf{J}\left(\mathbf{x}\right)\) has full column rank for all \(\mathbf{x} \in \mathbb {I}^n\).
The \(p\)-th column of \(\mathbf{J}\left(\mathbf{x}\right)\) is a nonlinear function only of \(x_p\), so it is impossible to express any column as an exact linear function of the other columns.
\(\boldsymbol {\Psi }\) is an injective function on \(\mathbb {I}^n\) for any \(\lambda {\gt} 0\)
Lemma 3.1.1 suffices to prove that \(\boldsymbol {\Psi }\) is locally injective throughout the interior of \(\mathbb {I}^n\). Thus, we only need to rule out some global and boundary concerns. Per Lemma 2.5.7, \(\varphi \) is Lipschitz continuous and so is each \(\psi _{p,q}\). Lipschitz continuous functions are absolutely continuous and so the Fundamental Theorem of Calculus applies. Suppose on the contrary that \(\boldsymbol {\Psi }\left(\mathbf{x}\right) = \boldsymbol {\Psi }\left(\mathbf{x}^\prime \right)\) but \(\mathbf{x} \neq \mathbf{x}^\prime \). There is a smooth path between \(\mathbf{x}\) and \(\mathbf{x}^\prime \). Integrating the directional derivatives along that path should yield a value zero, but cannot be zero since \(\mathbf{J}\left(\mathbf{x}\right)\) has full column rank.
I am not entirely sure how the details of the previous “proof” would work, so below is another proof that is very consistent with the KST literature.
3.2 Trigonometric Representation
If \(t \in \mathbb {T}\), then \(T_j\left(t\right) = \cos \left(j \cos ^{-1}t\right)\).
This lemma has been known for decades and is already proven in Mathlib. Let \(t = \cos \theta \). If \(j = 0\), then \(T_0\left(t\right) = 1 = \cos \left(0 \theta \right)\). If \(j = 1\), then \(T_1\left(t\right) = t = \cos \left(1 \theta \right)\). It remains to show for \(j {\gt} 1\) that the three-term recursion from Definition 1 is satisfied: \(T_j\left(t\right) = 2 \cos \theta \cos \left(\left(j - 1\right) \theta \right) - \cos \left(\left(j - 2\right) \theta \right)\). The first term can be rewritten as \(2\cos \theta \cos \left(\left(j - 1\right) \theta \right) = \cos \left(\theta - \left(j - 1\right)\theta \right) + \cos \left(\theta + \left(j - 1\right)\theta \right) = \cos \left(\left(j - 2\right)\theta \right) + \cos \left(j \theta \right)\), so \(T_j\left(t\right) = \cos \left(j \theta \right)\).
Under Definition 3 and Lemma 3.2.1, \(\frac{3}{7}+ \frac{1}{2} \sum \limits _{m = 0}^\infty \left(\frac{1}{8}\right)^{m} \cos \left(2^m \theta \right)\) resembles the function, \(w\left(y\right) = \sum \limits _{m = 0}^\infty \alpha ^m \cos \left(\beta ^m \pi y\right)\), Weierstrass proved to be nowhere differentiable when \(\beta \geq 7\) also satisfies \(\alpha \beta {\gt} 1 + \frac{3}{2} \pi \) for \(0 {\lt} \alpha {\lt} 1\). But the terms in our \(\varphi \) have less extreme oscillation, smaller amplitudes, and another composition, so \(\varphi \) merely has essential singularities at the endpoints of \(\mathbb {T}\).
The \(1 + 2^k\) extrema of \(T_{2^k}\) are called (Chebyshev-Lobatto) nodes and occur at \(\tau = -\cos \frac{j\pi }{2^k}\) with \(0 \leq j \leq 2^k\) and \(k \in \mathbb {N}\).
If \(\tau = -\cos \frac{j\pi }{2^k}\), then \(T_{2^{k}}\left(\tau \right) = \cos \left(2^k \cos ^{-1}\left(\cos \frac{-j\pi }{2^k}\right)\right) = \cos \left(-j\pi \right)\). If \(j\) is even, \(T_{2^{k}}\left(\tau \right) = 1\) and if \(j\) is odd \(T_{2^{k}}\left(\tau \right) = -1\).
The function approximation literature also defines nodes when the index of the Chebyshev polynomial is not a power of two, but we only consider the case where \(\tau = -\cos \frac{j\pi }{2^k}\). It is essential to recognize that if \(\tau = -\cos \frac{j\pi }{2^k}\), then it is also true that \(\tau = -\cos \frac{\left(2j\right)\pi }{2^{k + 1}}\). Thus, a node at iteration \(k\) is also a node at iteration \(k + 1\) but with an index that is double its previous index.
Let \(\gamma = -\cos \frac{\left(3j + 1\right)\pi }{2^{k - 1} \times 3}\) for some \(0 \leq j \leq 2^{k - 1}\) and \(k {\gt} 0\).
The orange points in Figure 2.1 are the values of \(\varphi \left(\mp \frac{1}{2}\right)\), whose argument \(-\cos \frac{\pi }{3} = -\frac{1}{2}\) and \(-\cos \frac{2\pi }{3} = \frac{1}{2}\) is a \(\gamma \)-node when \(k = 1\); see the second remark following Lemma 2.3.1. The red points in Figure 2.1 are the values of \(\varphi \) at Chebyshev-Lobatto nodes when \(k = 4\), which crucially can be calculated exactly in a finite number of arithmetic operations from \(\varphi _4\) or better.
If \(\tau = -\cos \frac{j\pi }{2^k}\), then \(\varphi _k\left(\tau \right) = \varphi \left(\tau \right)\).
If \(\tau = -\cos \frac{j\pi }{2^k}\), then Lemma 3.2.2 implies \(T_{2^{k}}\left(\tau \right) = \mp 1\), and Lemma C implies \(T_{2^{k + 1}}\left(\tau \right) = 1\) is a fixed point. By Lemma D with \(b = 8\), \(\varphi \left(\tau \right) = \frac{3}{7}+ \frac{1}{2} \sum \limits _{m = 0}^{k} 8^{-m} T_{2^{m}}\left(\tau \right) + \frac{1}{2}\sum \limits _{m = k + 1}^\infty 8^{-m} = \frac{3}{7}+ \frac{1}{2} \sum \limits _{m = 0}^{k} 8^{-m} T_{2^{m}}\left(\tau \right) + \frac{8^{-k}}{14}= \varphi _k\left(\tau \right)\). Thus, the first-order correction, \(\frac{8^{-k}}{14}\), is exact when \(\tau \) is a node.
3.3 Number Theory
If \(\tau = -\cos \frac{j\pi }{2^k}\) for some \(0 {\lt} j {\lt} 2^k\), then \(\tau \) is a root of \(U_{-1 + 2^k}\).
Standard using Lemma 2.4.1; extrema of polynomials are roots of their derivatives.
Let \(m {\gt} 0\) be an integer. The \(2^m\)-th cyclotomic polynomial is the irreducible polynomial, \(\Phi _{2^m}\left(r\right) \equiv r^{2^{m - 1}} - 1\).
If \(k {\gt} 0\), then \(U_{-1 + 2^k}\left(r\right) = \prod \limits _{m = 1}^k \Phi _{2^m}\left(r\right)\).
It can be done by induction.
\(U_{-1 + 2^k}\) factors into a product of irreducible polynomials, each of which only has complex roots, but their real parts are the roots of \(U_{-1 + 2^k}\).
A cyclotomic field is an extension of the field of rational numbers, \(\mathbb {Q}\), that is formed by adjoining some complex root, \(\zeta \), of a cyclotomic polynomial.
Let \(\mathbb {F}_\tau \equiv \mathbb {R} \bigcap \bigcup \limits _{k = 1}^\infty \mathbb {Q}\left(\zeta _{2^k}\right)\), where \(\zeta _{2^k}\) is some root of \(\Phi _{2^k}\).
If \(\lambda \notin \mathbb {F}_\tau \), then \(\Psi _q\) is an injective function on \(\left(\mathbb {F}_\tau \bigcap \mathbb {I}\right)^n\).
Let each \(x_p - \frac{q}{2n}\) and \(x_p^\prime - \frac{q}{2n}\) be a member of \(\mathbb {F}_\tau \bigcap \mathbb {I}\). Suppose \(\Psi _q\left(x_0, x_1, \cdots , x_{n - 1}\right) = \Psi _q\left(x_1^\prime , x_2^\prime , \dots , x_n^\prime \right)\) and rearrange this condition as
Suppose, on the contrary, that \(\Psi _q\) is not injective, which is to say that the above equation has a non-trivial solution. Then, the Mean Value Theorem states for a function like \(\varphi \) that is continuous on a closed interval (per Lemma 2.3.4) and differentiable on the open interval that excludes its endpoints (per Lemma 2.4.5), there exists a \(t_0\) strictly between \(x_0 - \frac{q}{2n}\) and \(x_0^\prime - \frac{q}{2n}\) such that \(\varphi \left(x_0 - \frac{q}{2n}\right) - \varphi \left(x_0^\prime - \frac{q}{2n}\right) = \left[x_0 - \frac{q}{2n} - x_0^\prime + \frac{q}{2n}\right] \dot{\varphi }\left(t_0\right)\). Moreover, \(t_0\) is unique in this case because \(\varphi \) is a strictly increasing function on \(\mathbb {T}\) per Lemma 2.5.6. After canceling the \(\Lambda \) from the above equation, substituting with the Mean Value Theorem, and dividing both sides by \(\dot{\varphi }\left(t_0\right)\), we obtain
The left-hand side is a member of \(\mathbb {F}_\lambda \), which cannot be equal to the right-hand side when \(\lambda \notin \mathbb {F}_\tau \) because the latter is a member of \(\mathbb {F}_\tau \left(\lambda , \dot{\varphi }\left(t_0\right)\right) / \mathbb {F}_\tau \). Thus, there is only a trivial solution to the original equation where \(x_p = x_p^\prime \) for all \(p\), which characterizes an injective function.
Suppose \(x_p - \frac{q}{2n} = -\cos \frac{j\pi }{2^k}\) and \(x_p^\prime - \frac{q}{2n} = -\cos \frac{j^\prime \pi }{2^k}\), in which case the Mean Value Theorem implies there is a (in this case unique) \(t_p\) strictly between \(x_p - \frac{q}{2n}\) and \(x_p^\prime - \frac{q}{2n}\) such that \(\left[x_p - x_p^\prime \right]\dot{\varphi }\left(t_p\right) = \varphi _k\left(x_p - \frac{q}{2n}\right) - \varphi _k\left(x_p^\prime - \frac{q}{2n}\right)\), where we have used Lemma 3.2.3 to substitute \(\varphi _k\) for \(\varphi \). Although \(\dot{\varphi }\left(t_p\right) \in \mathbb {F}_\tau \) because \(x_p - x_p^\prime \in \mathbb {F}_\tau \) and \(\varphi _k\left(x_p - \frac{q}{2n}\right) - \varphi _k\left(x_p^\prime - \frac{q}{2n}\right) \in \mathbb {F}_\tau \), it seems that \(t_p \notin \mathbb {F}_\tau \) when \(k {\gt} 1\). Thus, \(\dot{\varphi }_{k}\left(\right)\) can somehow map some foreign numbers in \(\mathbb {T}\) into \(\mathbb {F}_\tau \).
For some \(\lambda {\gt} 0\), \(\boldsymbol {\Psi }\) is an injective function on \(\mathbb {I}^n\).
The fact that \(\boldsymbol {\Psi }\) is injective on a dense set, \(\left(\mathbb {F}_\tau \bigcap \mathbb {I}\right)^n\), and is both convex (since it is a weighed sum of convex inner functions) and continuous allows us to do a proof by contradiction.