Formalizing a Lipschitz continuous version of Kolmogorov’s superposition theorem

2 Core Function

2.1 Main Definitions

The atomic element of our KSR is the first-kind Chebyshev polynomial. Let \(j\) be a generic integer that is usually non-negative, but is used in various ways in different contexts in this report.

Definition 1 Chebyshev polynomial of the first kind
#

Let \(r\) be a real number. The first-kind Chebyshev polynomial in \(r\) with index \(j\) is defined via a three-term recursion as

\begin{align*} T_j\left(r\right) \equiv \begin{cases} 1 & \text{if } j = 0 \\ r & \text{if } j = 1 \\ 2 r T_{j - 1}\left(r\right) - T_{j - 2}\left(r\right) & \text{if } j {\gt} 1. \end{cases}\end{align*}
Remark
#

The Mathlib library used by Lean (version 4) actually defines \(T_j\) as a function of \(j\) (which can be negative) that returns a polynomial (with coefficients in any commutative ring). Such generality is unnecessary here, but Mathlib also provides a metatheorem that can be used to tersely prove many properties of Chebyshev polynomials via double induction on \(j\). Our notation, \(T_j\left(r\right)\), corresponds to simply evaluating such a polynomial at a symbolic real number.

We specialize \(j = 2^m\) to define partial sums that approach our candidate for a core function.

Definition 2 \(k\)-th order approximator

\(\varphi _k\left(r\right) \equiv \frac{3}{7}+ \frac{8^{-k}}{14}+ \frac{1}{2} \sum \limits _{m = 0}^{k} 8^{-m} T_{2^{m}}\left(r\right)\), where \(k \in \mathbb {N}\).

Definition 3 Core function

\(\varphi \left(r\right) \equiv \frac{3}{7}+ \frac{1}{2} \sum \limits _{m = 0}^{\infty } 8^{-m} T_{2^{m}}\left(r\right)\).

For an overview of expanding a function in terms of Chebyshev polynomials, see Trefethen.

Definition 4 Normalizer
#

\(\Lambda \equiv \sum \limits _{p = 0}^{n - 1} \lambda ^p\), where \(\lambda {\gt} 0\) and \(n \geq 2\).

Definition 5 Scale factor

\(\lambda _p \equiv \frac{\lambda ^p}{\Lambda } \in \left(0,1\right)\).

Definition 6 Inner function
#

\(\psi _{p,q}\left(x\right) \equiv \lambda _p \varphi \left(x - \frac{q}{2n}\right)\), where \(x \in \mathbb {I}\) and \(q \in \{ 0, 1, \cdots , 2n\} \).

2.2 Repeatedly Used Lemmas

Our Chebyshev polynomials also follow a two-term recursion that drives nearly all other lemmas.

Lemma A Two-term recursion
#

\(T_{m + 1} = 2T_{m}^2 - 1\), where \(m \in \mathbb {N}\).

Proof

Since \(2^{m + 1} = 2 \times 2^m\), we can write \(T_{m + 1} = T_{2 \times 2^m}\). It is already proven by induction in Mathlib (TODO: add succinct explanation) that a Chebyshev polynomial whose index is a product of integers can be written as a composition of two Chebyshev polynomials, so \(T_{2 \times 2^m} = T_2 \circ T_{m}\). It follows from Definition 1 that \(T_2\left(r\right) = 2r^2 - 1\), so \(T_2 \circ T_{m} = 2T_{m}^2 - 1 = T_{m + 1}\).

Lemma B Bounds

\(T_{2^{m}}\left(r\right)^2 \leq 1\) if and only if \(r^2 \leq 1\).

Proof

Since \(T_{2^{0}}\left(r\right) = r\), having \(r^2 \leq 1\) is necessary and sufficient when \(m = 0\). Assume it holds for \(m\), so that we can use Lemma A to show it then holds for \(T_{2^{m + 1}}\left(r\right)^2 = \left(2T_{2^{m}}\left(r\right)^2 - 1\right)^2\). Under the inductive hypothesis that \(T_{2^{m}}\left(r\right)^2 \leq 1\), then the entire right-hand side is less than one in magnitude. Conversely, if \(T_{2^{m}}\left(r\right)^2 {\gt} 1\), then the right-hand side remains greater than one in magnitude.

Lemma C Fixed points
#

The only two fixed points of \(2 T_{2^{m}}\left(r\right)^2 - 1 \mapsto T_{2^{m + 1}}\left(r\right)\) are \(-\frac{1}{2}\) and \(1\).

Proof

If \(T_{2^{m + 1}}\left(r\right) = T_{2^{m}}\left(r\right)\) under Lemma A, then a quadratic equation, \(2 T_{2^{m}}\left(r\right)^2 - T_{2^{m}}\left(r\right) - 1 = \left(2T_{2^{m}}\left(r\right) + 1\right)\left(T_{2^{m}}\left(r\right) - 1\right) = 0\) whose only two solutions are \(-\frac{1}{2}\) and \(1\) defines the fixed points.

Remark
#

Both fixed points are repellent because the derivative of the mapping at a fixed point greater in magnitude than \(1\). However, this has not been formalized yet since it is not a priority.

If either fixed point is reached, the remaining terms in \(\varphi _k\) become a partial geometric series.

Lemma D Partial geometric series

If \(b {\gt} 1\), then \(\sum \limits _{m = j}^k b^{-m} = \frac{b^{1 - j} - b^{-k}}{b - 1}\).

Proof

\(\sum \limits _{m = j}^k b^{-m} = \sum \limits _{m = j}^k \left(\frac{1}{b}\right)^m \equiv S\), which is in the usual form of a geometric series with a ratio of \(\frac{1}{b}\) whose partial sums are already proven in Mathlib. The usual method of proof is to note that since \(S - \frac{1}{b} S = \sum \limits _{m = j}^k \left[\left(\frac{1}{b}\right)^m - \left(\frac{1}{b}\right)^{m + 1}\right] = \left(\frac{1}{b}\right)^j - \left(\frac{1}{b}\right)^{k + 1}\), then \(S = \frac{\left(\frac{1}{b}\right)^j - \left(\frac{1}{b}\right)^{k + 1}}{1 - \frac{1}{b}} = \frac{b^{-j} - b^{-k - 1}}{\frac{b - 1}{b}} = \frac{b^{1 - j} - b^{-k}}{b - 1}\).

Lemma E Symmetry

\(T_{2^{m}}\left(-r\right) = \left(-1\right)^{2^m} T_{2^{m}}\left(r\right)\).

Proof

This also holds for Chebyshev polynomials whose index is not a power of two, but the general case is not yet proven in Mathlib, so we will prove it for \(j = 2^m\). If \(m = 0\), then \(T_{2^{0}}\left(r\right) = r\) so the claimed equation holds. Per the two-term recursion in Lemma A, we need to show that \(2T_{2^{m}}\left(-r\right)^2 - 1 = T_{2^{m + 1}}\left(-r\right) = T_{2^{m + 1}}\left(r\right) = 2T_{2^{m}}\left(r\right)^2 - 1\) because \(\left(-1\right)^{2^m} = 1\) when \(m {\gt} 0\). Substituting for \(T_{2^{m}}\left(-r\right)\) with the inductive hypothesis implies \(2\left[\left(-1\right)^{2^m} T_{2^{m}}\left(r\right)\right]^2 - 1 = 2T_{2^{m}}\left(r\right) - 1\).

Lemma F Functional equation

\(\varphi _k\left(-r\right) = \varphi _k\left(r\right) - r\).

Proof

Per Lemma E, the only term that differs in Definition 3 between the \(r\) and \(-r\) cases is when \(m = 0\) and \(\frac{1}{2} 8^{0} T_{2^{0}}\left(\mp r\right) = \frac{\mp r}{2}\). Thus, \(\varphi _k\left(r\right) - \varphi _k\left(-r\right) = \frac{r}{2} - \frac{-r}{2} = r\).

Lemma G Linearly shifted sum-of-squares

\(\varphi _k\left(r\right) = \frac{5}{14} + \frac{8^{-k}}{7} + \frac{r}{2} + \frac{1}{8} \sum \limits _{m = 0}^{k - 1} 8^{-m} T_{2^{m}}\left(r\right)^2\).

Proof

Pull the \(T_{2^{0}}\left(r\right) = r\) term out of the series in Definition 2 and then shift the index of the summation while applying the two-term recursion from Lemma A to all the remaining terms to obtain \(\varphi _k\left(r\right) = \frac{3}{7}+ \frac{8^{-k}}{14}+ \frac{r}{2} + \frac{1}{2} \sum \limits _{m = 0}^{k - 1} 8^{-\left(m + 1\right)} \left(2 T_{2^{m}}\left(r\right)^2 - 1\right)\). We can pull a factor of \(\frac{1}{8}\) out of the sum and then separate it to obtain \(\varphi _k\left(r\right) = \frac{3}{7}+ \frac{8^{-k}}{14}+ \frac{r}{2} + \frac{1}{8} \sum \limits _{m = 0}^{k - 1} 8^{-m} T_{2^{m}}\left(r\right)^2 - \frac{1}{16} \sum \limits _{m = 0}^{k - 1} 8^{-m}\). Apply Lemma D with \(b = 8\) to the last term and simplify the constant to obtain the claimed result.

2.3 Convergence

Lemma 2.3.1

Let \(\mathbb {T} = \left[-1,1\right]\). If \(t \in \mathbb {T}\), then the sequence of partial sums, \(\{ \varphi _k\left(t\right)\} \) forms a uniformly Cauchy sequence, but if \(r \notin \mathbb {T}\), then the infinite sum in \(\varphi \left(r\right)\) diverges.

Proof

To prove the if direction, we utilize the Weierstrass M-test, which is sufficient to prove that the sequence of partial sums forms a uniformly Cauchy sequence. Lemma B implies \(T_{2^{m}}\left(t\right) \in \mathbb {T}\). Since \(\left|8^{-m} T_{2^{m}}\left(t\right)\right| \leq 8^{-m}\) and \(\frac{1}{2}\sum \limits _{m = 0}^\infty 8^{-m} = \frac{4}{7}\) by Lemma D with \(b = 8\), \(\varphi \) is well-defined on \(\mathbb {T}\).

To prove the only if direction, we utilize the last term test. It suffices to show that if \(\left|r\right| {\gt} 1\), then the limit of \(8^{-m} T_{2^{m}}\left(r\right)\) as \(m \uparrow \infty \) is not zero. When \(r {\gt} 1\), it is already proven in Mathlib (TODO: add succinct explanation) that \(T_j\left(r\right) = \cosh \left(j \theta \right)\) where \(\theta \) is implicitly defined by \(\cosh \theta = r\). Mathlib unfortunately lacks the \(\cosh ^{-1}\) function but has \(\sinh ^{-1}\), so we can explicitly define \(\theta \equiv \sinh ^{-1}\left(r^2 - 1\right) {\gt} 0\). Mathlib does prove that \(\cosh \left(\sinh ^{-1}\left(r^2 - 1\right)\right) = \sqrt{1 + r^2 - 1} = r {\gt} 1\). Moreover, \(\cosh \left(2^m \theta \right) = \frac{e^{2^m \theta } - e^{-2^m \theta }}{2}\), so \(\ln \frac{T_{2^{m}}\left(r\right)}{2^{3m}} = 2^m \theta + \ln \left(1 + e^{-2^{m + 1} \theta }\right) - \left(3m + 1\right) \ln 2\). As \(m \uparrow \infty \), a Maclaurin-series expansion of \(\ln \left(1 + e^{-2^{m + 1} \theta }\right)\) reveals it approaches the negligibly small value of \(e^{-2^{m + 1} \theta }\), and since \(2^m \theta {\gt} 0\), it approaches \(\infty \) as \(m \uparrow \infty \) at a much faster rate than \(-\left(3m + 1\right) \ln 2\) approaches \(-\infty \). Thus, there is no limit of \(8^{-m} T_{2^{m}}\left(r\right)\) as \(m \uparrow \infty \) when \(r {\gt} 1\). When \(r {\lt} -1\), Lemma E shows that the only term in the infinite sum that changes is when \(m = 0\) and \(T_{2^{0}}\left(\mp r\right) = \mp r\). Thus, the infinite sum also diverges when \(r {\lt} -1\).

Remark
#

Since the error in approximating \(\varphi \left(t\right)\) with \(\varphi _k\left(t\right)\) is less than \(\frac{2^{-3k}}{14}\), the approximation error if \(k = 17\) is less than \(\frac{2^{-52}}{7}\), where \(2^{-52}\) is the smallest double-precision number that can be added to \(1\) without underflow. Thus, only \(18\) iterations of an unrolled loop with only a few floating-point operations per iteration are needed to recursively evaluate \(\varphi _{17}\left(t\right)\) from Lemma G.

Remark
#

If \(t = \frac{a}{b}\) where \(a \in \mathbb {Z}\) and \(b \in \mathbb {N}\) such that \(\left|a\right| {\lt} b\) and \(a\) has no common factor with \(b\), then \(\varphi _k\left(\frac{a}{b}\right)\) is also rational because the only operations in Definition 3 are arithmetic. However, \(\varphi \left(\frac{a}{b}\right)\) is generally irrational. The only five known rational values of \(\varphi \) occur when one of the fixed points from Lemma C is quickly reached: \(\varphi \left(1\right) = 1\) (implicit in Lemma 2.3.1), \(\varphi \left(-1\right) = \varphi \left(1\right) - 1 = 0\) (from Lemma F), \(\varphi \left(-\frac{1}{2}\right) = \frac{3}{7} + \frac{1}{2} \sum \limits _{m = 0}^\infty 8^{-m} \left(-\frac{1}{2}\right) = \frac{1}{7}\) (from Lemmas C and D), \(\varphi \left(\frac{1}{2}\right) = \varphi \left(-\frac{1}{2}\right) + \frac{1}{2} = \frac{9}{14}\) (from Lemma F), and \(\varphi \left(0\right) = \frac{3}{7} + \frac{0}{2} - \frac{1}{8} + \frac{1}{2} \sum \limits _{m = 0}^\infty 8^{-m} = \frac{3}{8}\) (from Lemmas C and D). These five point coincide with the only five values where cosine and sine are rational according to Niven’s theorem. Although the partial sums converge uniformly with an error bound proportional \(\frac{1}{8^{k}}\), they do so with denominators that are too large for \(\varphi \left(\frac{a}{b}\right)\) to be a Liouville number, so they may well be algebraic rather than transcendental. In addition, if \(T_{2^{m}}\left(\frac{a}{b}\right)\) is never either of the fixed points from Lemma C, then the numerators of the partial sums do not have a limit.

Lemma 2.3.2

\(\varphi \) is not differentiable at \(-1\) and \(1\), which are essential logarithmic singularities.

Proof

The derivative is defined as \(\dot{\varphi }\left(t\right) \equiv \lim \limits _{h \rightarrow 0} \frac{\varphi \left(t + h\right) - \varphi \left(t\right)}{h}\). Per Lemma 2.3.1, if \(t + h \notin \mathbb {T}\), then this limit is undefined, which can occur if both \(t = 1\) and \(h {\gt} 0\) or if both \(t = -1\) and \(h {\lt} 0\). In other words, \(\varphi \) is merely right differentiable at \(t = 1\) and left differentiable at \(t = -1\).

Proposition 2.3.3

If \(q = 0\) or \(q = 2n\), then \(\psi _{p,q} \notin C^1\left(\mathbb {I}\right)\).

Proof

Recall from Definition 6 that \(\psi _{p,q}\left(x\right) \equiv \lambda _p \varphi \left(x - \frac{q}{2n}\right)\). Per Lemma 2.3.2, if \(x + h - \frac{q}{2n} \notin \mathbb {T}\), then \(\dot{\varphi }\) does not exist at that point and hence neither does \(\dot{\psi }_{p,q}\) exist at that point. If \(h {\gt} 0\), that exceptional case can occur if and only if both \(x = 1\) and \(q = 0\), and if \(h {\lt} 0\), it can also occur if and only if both \(x = 0\) and \(q = 2n\).

Remark
#

Thus, our candidate for a core function narrowly avoids the fact proven by Vituskin and Khenkin that the KST would not hold if all the inner functions were continuously differentiable.

Lemma 2.3.4

\(\varphi \in C\left(\mathbb {T}\right)\).

Proof

\(\varphi _k\) in Definition 2 is a sum of polynomials and thus is continuous. Per Lemma 2.3.1, \(\{ \varphi _k\} \) converges uniformly to \(\varphi \) on \(\mathbb {T}\). The uniform limit of a sequence of continuous functions is continuous, so \(\varphi \) is continuous over \(\mathbb {T}\) but no smoother in light of Lemma 2.3.2.

Lemma 2.3.5

If \(q = 0\) or \(q = 2n\), then \(\psi _{p,q} \in C\left(\mathbb {I}\right)\).

Proof

Recall from Definition 6 that \(\psi _{p,q}\left(x\right) \equiv \lambda _p \varphi \left(x - \frac{q}{2n}\right)\), so Lemma 2.3.4 in fact implies \(\psi _{p,q} \in C\left(\mathbb {I}\right)\) for any \(p\) and \(q\), but we will strengthen this for \(0 {\lt} q {\lt} 2n\) below in Proposition 2.4.10.

However, these last few results lack nuance because the derivative is undefined only at one or both endpoints of a domain, and the functions are quite smooth on the interior, as is shown in Figure 2.1. Both \(\varphi _{17}\) (in blue) and its derivative (in green) are are clearly increasing functions from \(\mathbb {T}\) to \(\mathbb {I}\). It appears that a small perturbation to \(\varphi \) would violate one or more of our three requirements for a core function in the KSR (that are proven in Section 2.5): increasing over \(\mathbb {T}\), not continuously differentiable over \(\mathbb {T}\), and having a bounded derivative wherever it exists on \(\mathbb {T}\). There may well be other ways to define a core function that has these three properties, but no one really has. Moreover, it seems difficult to construct one that converges more rapidly than \(\varphi _k\) does.

Figure 2.1 The (well-approximated) core function and its derivative
\includegraphics[width=0.95\linewidth ]{../images/core.png}

The blue line is for \(\varphi _{17}\), which approximates \(\varphi \) to double-precision. \(\varphi _{17}\) is a polynomial of degree \(2^{17} = 131,072\) whose exponents are all even numbers except for the linear term, which oscillates around a line almost \(200\) times per pixel with amplitudes of about a quadrillionth. The green line is for the exact derivative of \(\varphi _{17}\), which approximates the derivative of \(\varphi \) on \(\left(-1,1\right)\) in which case the latter exists, as is proven later in Lemma 2.5.1. The black dotted line is for \(\varphi _0\). The red points are discussed in Chapter 3 and are \(\varphi \left(\tau \right)\) at Chebyshev-Lobatto nodes when \(\tau = -\cos \frac{j\pi }{16}\) and \(j\) is an integer between \(0\) and \(16\) inclusive. The two orange points are \(\varphi \left(-\frac{1}{2}\right) = \frac{1}{7}\) and \(\varphi \left(\frac{1}{2}\right) = \frac{9}{14}\), which along with the nodal values \(\varphi \left(-1\right) = 0\), \(\varphi \left(0\right) = \frac{3}{8}\), and \(\varphi \left(1\right) = 1\) are the only five known rational values of \(\varphi \) that are discussed in the second remark following Lemma 2.3.1.

2.4 Differentiability

The fact that \(\{ \varphi _k\} \) converges uniformly to \(\varphi \) per Lemma 2.3.1 does not, by itself, imply that \(\{ \dot{\varphi }_{k}\left(\right)\} \) converges in any sense to \(\dot{\varphi }\), in part because the derivative of \(\varphi \) does not even exist at \(-1\) and \(1\) per Lemma 2.3.2. The purpose of this subsection is to establish rigorously that the intuition from the green line in Figure 2.1 holds on the open interval, \(\left(-1,1\right)\), due to the uniform convergence of the derivatives, which can also be established using the Weierstrass M-test. Since \(T_{2^m}\) and \(\varphi _k\) are polynomials, their derivatives are also polynomials, although their limit as \(k \uparrow \infty \) is not.

Definition 7 Chebyshev polynomial of the second kind
#

The second-kind Chebyshev polynomial in \(r\) with index \(j \geq -1\) is defined recursively as

\begin{align*} U_j\left(r\right) \equiv \begin{cases} 0 & \text{if } j = -1 \\ 1 & \text{if } j = 0 \\ 2r & \text{if } j = 1 \\ 2 r U_{j - 1}\left(r\right) - U_{j - 2}\left(r\right) & \text{if } j {\gt} 1 \end{cases}\end{align*}
Lemma 2.4.1

If \(j \geq 0\), then \(\dot{T}_j\left(r\right) = j U_{j - 1}\left(r\right)\).

Proof

This is already proven by induction in Mathlib. TODO: Add a succinct explanation.

Lemma 2.4.2

\(U_{-1 + 2^{m + 1}}\left(r\right) = 2 T_{2^{m}}\left(r\right) U_{-1 + 2^m}\left(r\right)\).

Proof

Differentiate both sides of the two-term recursion from Lemma A and utilize Lemma 2.4.1 to obtain \(2^{m + 1} U_{-1 + 2^{m + 1}}\left(r\right) = 2^2T_{2^{m}}\left(r\right) 2^m U_{-1 + 2^m}\left(r\right)\), and then divide both sides by \(2^{m + 1}\).

\(\dot{\varphi }_{k}\left(r\right) = \frac{1}{2} \sum \limits _{m = 0}^k 4^{-m} U_{-1 + 2^m}\left(r\right)\).

Proof

Applying Lemma 2.4.1 to each non-constant term in Definition 2 yields this result.

Lemma 2.4.4

If \(t \in \mathbb {T}\), then the sequence of partial sums, \(\{ \dot{\varphi }_{k}\left(t\right)\} \) forms a uniformly Cauchy sequence.

Proof

We utilize the Weierstrass M-test, which is sufficient to prove that the sequence of partial sums forms a uniformly Cauchy sequence. If \(m = 0\), then \(U_0\left(t\right) = t\) has a minimum of \(-1\) and a maximum of \(1\). Per Lemma B, if \(t \in \mathbb {T}\), then \(T_{2^{m}}\left(t\right) \in \mathbb {T}\), so Lemma 2.4.2 indicates that the range of \(U_{-1 + 2^{m + 1}}\) doubles with each \(m\). Thus, \(U_{-1 + 2^{m + 1}}\) has a minimum of \(-2^m\) and a maximum of \(2^m\). Since \(\left|4^{-m} U_{-1 + 2^m}\right| \leq 2^{-m}\) and \(\frac{1}{2}\sum \limits _{m = 0}^\infty 2^m = 1\) per Lemma D with \(b = 2\), as \(k \uparrow \infty \), \(\{ \dot{\varphi }_{k}\left(t\right)\} \) converges uniformly on \(\mathbb {T}\).

Lemma 2.4.5

\(\lim \limits _{k \uparrow \infty } \dot{\varphi }_k\) is a weak derivative of \(\varphi \) that coincides with \(\dot{\varphi }\) on \(\left(-1,1\right)\).

Proof

The uniform convergence of \(\{ \dot{\varphi }_{k}\left(t\right)\} \) in Lemma 2.4.4 justifies interchanging the limits in

\begin{eqnarray*} \dot{\varphi }_{\infty }\left(t\right) = \lim \limits _{h \rightarrow 0} \frac{\lim \limits _{k \uparrow \infty } \left[\varphi _k\left(t + h\right) - \varphi _k\left(t\right)\right]}{h} = \lim \limits _{k \uparrow \infty } \lim \limits _{h \rightarrow 0} \frac{\varphi _k\left(t + h\right) - \varphi _k\left(t\right)}{h} = \frac{1}{2} \sum \limits _{m = 0}^\infty 4^{-m} U_{-1 + 2^m}\left(t\right), \end{eqnarray*}

where the last line follows from Lemma 2.4.3. Per Lemma 2.3.1, since \(\{ \varphi _k\left(t\right)\} \) converges to \(\varphi \left(t\right)\) for at least one (actually all) value(s) of \(t \in \mathbb {T}\), \(\dot{\varphi }\left(t\right)\) exists on any proper subinterval of \(\mathbb {T}\).

\(\left(1 - r^2\right) \ddot{T}_{j}\left(r\right) - t \dot{T}_{j}\left(r\right) + j^2 T_j\left(r\right) = 0\).

Proof

Although this lemma has been known for decades, it has not been formalized in Mathlib, so we follow the proof in Ponton. From Definition 1, if \(j = 0\), then \(T_0\left(r\right) = 1\), so \(\dot{T}_0\left(r\right) = 0\) and \(\ddot{T}_0\left(r\right) = 0\), which are consistent with the above Ordinary Differential Equation (ODE). Similarly, if \(j = 1\), then \(T_1\left(r\right) = t\) so \(\dot{T}_1\left(r\right) = 1\) and \(\ddot{T}_1\left(r\right) = 0\), which are also consistent with the above ODE.

If \(j {\gt} 1\), we can differentiate the three-term recursion, \(T_j\left(r\right) = 2 r T_{j - 1}\left(r\right) - T_{j - 2}\left(r\right)\), to obtain

\begin{eqnarray*} \dot{T}_j\left(r\right) & =& 2 T_{j - 1}\left(r\right) + 2 r \dot{T}_{j - 1}\left(r\right) - \dot{T}_{j - 2}\left(r\right) \\ \ddot{T}_j\left(r\right) & =& 4 \dot{T}_{j - 1}\left(r\right) + 2 r \ddot{T}_{j - 1}\left(r\right) - \ddot{T}_{j - 2}\left(r\right) \end{eqnarray*}

If we substitute these three relations into the purported ODE and factor, we need to show that

\begin{eqnarray*} 0 & =& 2r \left[ \left(1 - r^2\right) \ddot{T}_{j - 1}\left(r\right) - r \dot{T}_{j - 1}\left(r\right) + \left(j - 1\right)^2 T_{j - 1}\left(r\right) \right] \\ & -& \left[\left(1 - r^2\right) \ddot{T}_{j - 2}\left(r\right) - r \dot{T}_{j - 2}\left(r\right) + \left(j - 2\right)^2 T_{j - 2}\left(r\right) \right] \\ & +& 4 \left(j - 1\right) \left[\left(1 - r^2\right) U_{j - 2}\left(r\right) + r T_{j - 1}\left(r\right) - T_{j - 2}\left(r\right) \right]. \end{eqnarray*}

In each of the first two lines, the term in brackets is zero due to the inductive hypothesis, so we need to show that the term in brackets in the third line is also zero. It is already proven by induction in Mathlib (TODO: add succinct explanation) that \(\left(1 - r^2\right) U_{j - 2}\left(r\right) = r T_{j - 1}\left(r\right) - T_{j}\left(r\right)\), so substituting and then expressing \(T_{j}\left(r\right)\) via the three-term recursion yields the claimed result.

If \(t \in \left(-1,1\right)\), then \(\ddot{\varphi }_{k}\left(t\right) = \frac{t}{1 - t^2} \dot{\varphi }_{k}\left(t\right) - \frac{1}{2\left(1 - t^2\right)} \sum \limits _{m = 0}^k 2^{-m} T_{2^{m}}\left(t\right)\).

Proof

Differentiate Definition 2 twice to obtain \(\ddot{\varphi }_{k}\left(t\right) = \frac{1}{2} \sum \limits _{m = 0}^k 8^{-m} \ddot{T}_{2^m}\left(t\right)\). Substitute inside the sum using Lemma 2.4.6 after isolating \(\ddot{T}_{2^m}\left(t\right)\), and then simplify using Lemma 2.4.3.

Remark
#

If \(t^2 = 1\), then \(\ddot{\varphi }_{k}\left(\mp 1\right) = \frac{2^{k + 1} + 2^{-k} - 3}{6}\) by L’Hospital’s rule, but this edge case is irrelevant.

Lemma 2.4.8

If \(t \in \left(-1,1\right)\), then the sequence of partial sums, \(\{ \ddot{\varphi }_{k}\left(t\right)\} \) forms a uniformly Cauchy sequence.

Proof

Lemma 2.4.4 establishes that \(\dot{\varphi }_{k}\left(t\right)\) in Lemma 2.4.7 converges uniformly on \(\mathbb {T}\). The same proof of the “if” direction of Lemma 2.3.1 but now with \(b = 2\) shows that the other sum also converges uniformly.

Lemma 2.4.9

\(\lim \limits _{k \uparrow \infty } \ddot{\varphi }_k\) is a weak second derivative of \(\varphi \) that coincides with \(\ddot{\varphi }\) on \(\left(-1,1\right)\).

Proof

The uniform convergence of \(\{ \ddot{\varphi }_{k}\left(t\right)\} \) from Lemma 2.4.8 justifies interchanging the limits in

\begin{eqnarray*} \ddot{\varphi }_{\infty }\left(t\right) & =& \lim \limits _{h \rightarrow 0} \frac{\lim \limits _{k \uparrow \infty }\left[\dot{\varphi }_{k}\left(t + h\right) - \dot{\varphi }_{k}\left(t\right)\right]}{h} = \lim \limits _{k \uparrow \infty } \lim \limits _{h \rightarrow 0} \frac{\dot{\varphi }_{k}\left(t + h\right) - \dot{\varphi }_{k}\left(t\right)}{h} = \lim \limits _{k \uparrow \infty } \ddot{\varphi }_{k}\left(t\right) \\ & =& \frac{t}{1 - t^2} \dot{\varphi }_{\infty }\left(t\right) + \frac{1}{2 \left(1 - t^2\right)} \sum \limits _{m = 0}^\infty 2^{-m} T_{2^{m}}\left(t\right). \end{eqnarray*}

Since \(\dot{\varphi }\) does not exist at \(-1\) or \(1\), neither does \(\ddot{\varphi }\) exist at those two points. However, per Lemma 2.4.4, since \(\{ \dot{\varphi }_{k}\left(t\right)\} \) converges to \(\dot{\varphi }\left(t\right)\) for at least one (actually all) value(s) of \(t \in \left(-1,1\right)\), \(\ddot{\varphi }_{\infty }\) coincides with \(\ddot{\varphi }\left(t\right)\) on \(\left(-1,1\right)\) and evaluates to the extended real number, \(\infty \), if \(t^2 = 1\).

Proposition 2.4.10

If \(0 {\lt} q {\lt} 2n\), then \(\psi _{p,q} \in C^2\left(\mathbb {I}\right)\).

Proof

Recall from Definition 6 that \(\psi _{p,q}\left(x\right) \equiv \lambda _p \varphi \left(x - \frac{q}{2n}\right)\). If \(0 {\lt} q {\lt} 2n\), then \(x - \frac{q}{2n}\) cannot be \(-1\) or \(1\), in which case Lemma 2.4.8 implies that \(\psi _{p,q}\) is twice differentiable.

Remark
#

Proposition 2.4.10 does not contravene the conclusion of Vitushkin and Khenkin that the KST cannot hold if all the inner functions were continuously differentiable because Proposition 2.3.3 proves that if \(q = 0\) or \(q = 2n\), then \(\psi _{p,q} \notin C^1\left(\mathbb {I}\right)\). Technicalities aside, our construction is much smoother than any other one in the KSR literature.

Remark
#

It seems implausible for \(\varphi \) to have a third derivative over any proper subinterval of \(\mathbb {T}\) because differentiating once more under the sum in Lemma 2.4.8 would yield a term, \(\sum \limits _{m = 0}^\infty U_{-1 + 2^m}\left(t\right)\), that diverges unless \(t\) is a root of some \(U_{-1 + 2^m}\) and thus a root of \(U_{-1 + 2^{m + 1}}\) per Lemma 2.4.2.

Remark
#

\(\dot{\varphi }_{17}\left(t\right)\) and \(\ddot{\varphi }_{17}\left(t\right)\) are less accurate near the endpoints of \(\mathbb {T}\) than is \(\varphi _{17}\left(t\right)\) due to the slower rates of convergence in the geometric series. But most numerical applications do not require extreme accuracy in the derivatives.

2.5 Requisite Properties

This section merely proves what is already strongly suggested by Figure 2.1, namely that \(\varphi \) is weakly convex, strictly increasing, and a weak contraction.

Lemma 2.5.1

\(\dot{\varphi }_{k}\left(-r\right) = 1 - \dot{\varphi }_{k}\left(r\right)\).

Proof

Differentiate both sides of the functional equation in Lemma F.

Lemma 2.5.2

\(\ddot{\varphi }_{k}\left(-r\right) = \ddot{\varphi }_{k}\left(r\right)\).

Proof

Differentiate both sides of the functional equation in Lemma 2.5.1.

Lemma 2.5.3

If \(k {\gt} 0\), then \(\varphi _k\) is a strictly convex function over \(\mathbb {T}\), and if \(k = 0\), then \(\varphi _0\) is weakly convex over \(\mathbb {T}\).

Proof

A twice-differentiable function, such as \(\varphi _k\), is strictly convex on \(\mathbb {T}\) if and only if its second derivative is positive on \(\mathbb {T}\). Suppose on the contrary that \(\ddot{\varphi }_{k}\left(t\right) = 0\) for some \(t \in \mathbb {T}\). Since \(\ddot{\varphi }_k\) is an even function per Lemma 2.5.2, \(\ddot{\varphi }_{k}\left(-t\right) = 0\) as well. Subtracting the latter equation from the former using the expression for the second derivative in Lemma 2.4.7 implies \(\frac{t}{1 -t^2} \dot{\varphi }_{k}\left(t\right) - \frac{-t}{1 - t^2} \dot{\varphi }_{k}\left(-t\right) = \frac{1}{2 \left(1 - t^2\right)} \sum \limits _{m = 0}^k 2^{-m} \left(T_{2^{m}}\left(t\right) - T_{2^{m}}\left(-t\right)\right)\). The left-hand side simplifies to \(\frac{t}{1 - t^2}\) under Lemma 2.5.1, while the right-hand side also simplifies to \(\frac{t}{1 - t^2}\) under Lemma E. Together that implies that \(\ddot{\varphi }_k\) would be a horizontal line at zero, which is actually the case when \(k = 0\) and \(\varphi _0\left(t\right) = \frac{1 + t}{2}\) is a line segment. If \(k {\gt} 0\) and \(t = 0\), then Lemma 2.4.7 implies \(\ddot{\varphi }_{k}\left(0\right) = \frac{0}{1} \dot{\varphi }_{k}\left(0\right) - \frac{1}{2}\left(0 - \frac{1}{2} + \sum \limits _{m = 2}^k 2^{-m}\right) = 2^{-k - 1} {\gt} 0\), where the last equality follows from Lemma D with \(b = 2\). This example contradicts the premise that \(\ddot{\varphi }_k\) is a horizontal line at zero when \(k {\gt} 0\) and a certifies that the second derivative is strictly positive over \(\mathbb {T}\) rather than strictly negative.

Lemma 2.5.4

\(\varphi \) is a weakly convex function over \(\mathbb {T}\).

Proof

Per Lemma 2.5.3, \(\varphi _k\) is a strictly convex when \(k {\gt} 0\), and per Lemma 2.3.1 \(\{ \varphi _k\} \) converges uniformly to \(\varphi \). The uniform limit of a sequence of strictly convex functions is at least a weakly convex function. Although \(\lim \limits _{k \uparrow \infty } 2^{-k - 1} = 0\) and \(\ddot{\varphi }\left(0\right) = 0\) is an inflection point, \(\ddot{\varphi }\) can never be negative on \(\mathbb {T}\), so \(\varphi \) is weakly convex.

Lemma 2.5.5

If \(t \in \mathbb {T}\), then \(2^{-k - 1} \leq \dot{\varphi }_{k}\left(t\right) \leq 1 - 2^{-k - 1}\).

Proof

Per Lemma 2.5.3, \(\varphi _k\) is strictly convex when \(k {\gt} 0\), so its derivative in Lemma 2.4.3 is an increasing function over \(\mathbb {T}\) that is minimized at \(t = -1\) and maximized at \(t = 1\) (which is also true if \(k = 0\)). It is already proven by induction in Mathlib that \(U_j\left(-1\right) = \left(-1\right)^j \left(j + 1\right)\), which is apparent from Definition 7. Per Lemma 2.4.3, \(\dot{\varphi }_{k}\left(-1\right) = \frac{1}{2} - \frac{1}{2} \sum \limits _{m = 1}^k 4^{-m} 2^m = \frac{1}{2} - \frac{1}{2} \sum \limits _{m = 1}^k 2^{-m} = 2^{-k - 1}\) by Lemma D with \(b = 2\). Lemma 2.5.1 then implies \(\dot{\varphi }_{k}\left(1\right) = 1 - 2^{-k - 1}\).

Lemma 2.5.6

\(\varphi \) is a strictly increasing function over \(\mathbb {T}\).

Proof

Per Lemma 2.5.5, \(\varphi _k\) is a strictly increasing function over \(\mathbb {T}\), and per Lemma 2.3.1 \(\{ \varphi _k\} \) converges uniformly to \(\varphi \). The uniform limit of strictly increasing functions is a strictly increasing function, so \(\varphi \) is a strictly increasing function.

Lemma 2.5.7

\(\varphi \) is a weak contraction over \(\mathbb {T}\).

Proof

Lemma 2.5.5 implies \(\varphi _k\) is Lipschitz continuous with a constant of \(1 - 2^{-k - 1}\), and per Lemma 2.3.1 \(\{ \varphi _k\} \) converges uniformly to \(\varphi \). The uniform limit of Lipschitz continuous functions is Lipschitz continuous. Although the derivative of \(\varphi \) does not exist at \(t = 1\), the supremum of the derivative of \(\varphi \) is \(1\), making it a weak contraction on \(\mathbb {T}\).

If \(t \in \mathbb {T}\), then \(0 \leq \varphi _k\left(t\right) \leq 1\).

Proof

Per Lemma 2.5.5, \(\varphi _k\) is a strictly increasing function over \(\mathbb {T}\), which is minimized at \(t = -1\) and maximized at \(t = 1\). If \(t = 1\), Lemma C implies \(T_{2^{m}}\left(1\right) = 1\). Thus, Definition 2 specializes to \(\varphi _k\left(1\right) = \frac{3}{7}+ \frac{8^{-k}}{14}+ \frac{1}{2} \sum \limits _{m = 0}^k 8^{-m} = \frac{3}{7}+ \frac{8^{-k}}{14}+ \frac{8 - 8^{-k}}{14} = 1\) by Lemma D with \(b = 8\). Lemma F then implies \(\varphi _k\left(-1\right) = 0\).

Hence, the unusual constant, \(\frac{8^{-k}}{14}\), in Definition 2 anchors its range to be the same for all \(k\).

Lemma 2.5.9

If \(t \in \mathbb {T}\), then \(0 \leq \varphi \left(t\right) \leq 1\).

Proof

Since the bounds on \(\varphi _k\) in Lemma 2.5.8 do not depend on \(k\), they are preserved as \(k \uparrow \infty \).

This chapter has demonstrated that our construction threads a needle. Lemma 2.3.2 states that \(\varphi \) is technically not continuously differentiable over \(\mathbb {T}\), but only because it has essential singularities at both endpoints of \(\mathbb {T}\). Thus, it is more useful to focus on the facts that \(\varphi \) is twice differentiable on \(\left(-1,1\right)\) per Lemma 2.4.9 and can be uniformly approximated by high-degree polynomials per Lemma 2.3.1. Thus, from a numerical perspective, \(\varphi \) is no different from the inverse trigonometric functions that are implemented in standard libraries, in the sense that they can be approximated to machine precision and the lack of a derivative at two or three points is a minor inconvenience.