Jump to content

Kolmogorov–Arnold representation theorem

From Wikipedia, the free encyclopedia

In real analysis and approximation theory, the Kolmogorov–Arnold representation theorem (or superposition theorem) states that every multivariate continuous function can be represented as a superposition of continuous single-variable functions.

The works of Vladimir Arnold and Andrey Kolmogorov established that if f is a multivariate continuous function, then f can be written as a finite composition of continuous functions of a single variable and the binary operation of addition.[1] More specifically,

.

where and .

There are proofs with specific constructions.[2]

It solved a more constrained form of Hilbert's thirteenth problem, so the original Hilbert's thirteenth problem is a corollary.[3][4][5] In a sense, they showed that the only true multivariate function is the sum, since every other function can be written using univariate functions and summing.[6]: 180 

History

[edit]

The Kolmogorov–Arnold representation theorem is closely related to Hilbert's 13th problem. In his Paris lecture at the International Congress of Mathematicians in 1900, David Hilbert formulated 23 problems which in his opinion were important for the further development of mathematics.[7] The 13th of these problems dealt with the solution of general equations of higher degrees. It is known that for algebraic equations of degree 4 the solution can be computed by formulae that only contain radicals and arithmetic operations. For higher orders, Galois theory shows us that the solutions of algebraic equations cannot be expressed in terms of basic algebraic operations. It follows from the so called Tschirnhaus transformation that the general algebraic equation

can be translated to the form . The Tschirnhaus transformation is given by a formula containing only radicals and arithmetic operations and transforms. Therefore, the solution of an algebraic equation of degree can be represented as a superposition of functions of two variables if and as a superposition of functions of variables if . For the solution is a superposition of arithmetic operations, radicals, and the solution of the equation .

A further simplification with algebraic transformations seems to be impossible which led to Hilbert's conjecture that "A solution of the general equation of degree 7 cannot be represented as a superposition of continuous functions of two variables". This explains the relation of Hilbert's thirteenth problem to the representation of a higher-dimensional function as superposition of lower-dimensional functions. In this context, it has stimulated many studies in the theory of functions and other related problems by different authors.[8]

Variants

[edit]

A variant of Kolmogorov's theorem that reduces the number of outer functions is due to George Lorentz.[9] He showed in 1962 that the outer functions can be replaced by a single function . More precisely, Lorentz proved the existence of functions , , such that

David Sprecher [10] replaced the inner functions by one single inner function with an appropriate shift in its argument. He proved that there exist real values , a continuous function , and a real increasing continuous function with , for , such that

Phillip A. Ostrand [11] generalized the Kolmogorov superposition theorem to compact metric spaces. For let be compact metric spaces of finite dimension and let . Then there exists continuous functions and continuous functions such that any continuous function is representable in the form

Limitations

[edit]

The theorem does not hold in general for complex multi-variate functions, as discussed here.[4] Furthermore, the non-smoothness of the inner functions and their "wild behavior" has limited the practical use of the representation,[12] although there is some debate on this.[13]

Applications

[edit]

In the field of machine learning, there have been various attempts to use neural networks modeled on the Kolmogorov–Arnold representation.[14][15][16][17][18] In these works, the Kolmogorov–Arnold theorem plays a role analogous to that of the universal approximation theorem in the study of multilayer perceptrons.

Proof

[edit]

Here we prove one example. This proof closely follows [19]. We prove the case of functions depending on two variables, as the generalization is immediate.

Setup

[edit]
  • Let be the unit interval .
  • Let be the set of continuous functions of type . It is a function space with supremum norm (it is a Banach space).
  • Let be a continuous function of type , and let be the supremum of it on .
  • Let be a positive irrational number. Its exact value is irrelevant.

We say that a 5-tuple is a Kolmogorov-Arnold tuple if and only if any there exists a continuous function , such that In the notation, we have the following:

Theorem — The Kolmogorov-Arnold tuples make up an open and dense subset of .

Proof

[edit]

Fix a . We show that a certain subset is open and dense: There exists continuous such that , and We can assume that with no loss of generality.

By continuity, the set of such 5-tuples is open in . It remains to prove that they are dense.

The key idea is to divide into an overlapping system of small squares, each with a unique address, and define to have the appropriate value at each address.

Grid system

[edit]

Let . For any , for all large , we can discretize into a continuous function satisfying the following properties:

  • is constant on each of the intervals .
  • These values are different rational numbers.
  • .

This function creates a grid address system on , divided into streets and blocks. The blocks are of form .

An example construction of and the corresponding grid system.

Since is continuous on , it is uniformly continuous. Thus, we can take large enough, so that varies by less than on any block.

On each block, has a constant value. The key property is that, because is irrational, and is rational on the blocks, each block has a different value of .

So, given any 5-tuple , we construct such a 5-tuple . These create 5 overlapping grid systems.

Enumerate the blocks as , where is the -th block of the grid system created by . The address of this block is , for any . By adding a small and linearly independent irrational number (the construction is similar to that of the Hamel basis) to each of , we can ensure that every block has a unique address.

By plotting out the entire grid system, one can see that every point in is contained in 3 to 5 blocks, and 2 to 0 streets.

Construction of g

[edit]

For each block , if on all of then define ; if on all of then define . Now, linearly interpolate between these defined values. It remains to show this construction has the desired properties.

For any , we consider three cases.

If , then by uniform continuity, on every block that contains the point . This means that on 3 to 5 of the blocks, and have an unknown value on 2 to 0 of the streets. Thus, we have givingSimilarly for .

If , then since , we still have

Baire category theorem

[edit]

Iterating the above construction, then applying the Baire category theorem, we find that the following kind of 5-tuples are open and dense in : There exists a sequence of such that , , etc. This allows allows their sum to be defined: , which is still continuous and bounded, and it satisfies Since has a countable dense subset, we can apply the Baire category theorem again to obtain the full theorem.

Extensions

[edit]

The above proof generalizes for -dimensions: Divide the cube into interlocking grid systems, such that each point in the cube is on to blocks, and to streets. Now, since , the above construction works.

Indeed, this is the best possible value. We cannot reduce it to (Sternfeld, 1985 [20]), with a relatively short proof given in [21] via dimension theory.

Theorem. One can pick such a tuple to be strictly monotonically increasing.[22]

References

[edit]
  1. ^ Bar-Natan, Dror. "Dessert: Hilbert's 13th Problem, in Full Colour".
  2. ^ Braun, Jürgen; Griebel, Michael (2009). "On a constructive proof of Kolmogorov's superposition theorem". Constructive Approximation. 30 (3): 653–675. doi:10.1007/s00365-009-9054-2.
  3. ^ Khesin, Boris A.; Tabachnikov, Serge L. (2014). Arnold: Swimming Against the Tide. American Mathematical Society. p. 165. ISBN 978-1-4704-1699-7.
  4. ^ a b Akashi, Shigeo (2001). "Application of ϵ-entropy theory to Kolmogorov—Arnold representation theorem". Reports on Mathematical Physics. 48 (1–2): 19–26. doi:10.1016/S0034-4877(01)80060-4.
  5. ^ Morris, Sidney A. (2020-07-06). "Hilbert 13: Are there any genuine continuous multivariate real-valued functions?". Bulletin of the American Mathematical Society. 58 (1): 107–118. doi:10.1090/bull/1698. ISSN 0273-0979.
  6. ^ Diaconis, Persi; Shahshahani, Mehrdad (1984). "On nonlinear functions of linear combinations" (PDF). SIAM Journal on Scientific Computing. 5 (1): 175–191. doi:10.1137/0905013.
  7. ^ Hilbert, David (1902). "Mathematical problems". Bulletin of the American Mathematical Society. 8 (10): 461–462. doi:10.1090/S0002-9904-1902-00923-3.
  8. ^ Jürgen Braun, On Kolmogorov's Superposition Theorem and Its Applications, SVH Verlag, 2010, 192 pp.
  9. ^ Lorentz, G. G. (1962). "Metric entropy, widths, and superpositions of functions". American Mathematical Monthly. 69 (6): 469–485. doi:10.1080/00029890.1962.11989915.
  10. ^ Sprecher, David A. (1965). "On the Structure of Continuous Functions of Several Variables". Transactions of the American Mathematical Society. 115: 340–355. doi:10.2307/1994273. JSTOR 1994273.
  11. ^ Ostrand, Phillip A. (1965). "Dimension of metric spaces and Hilbert's problem 13". Bulletin of the American Mathematical Society. 71 (4): 619–622. doi:10.1090/s0002-9904-1965-11363-5.
  12. ^ Girosi, Federico; Poggio, Tomaso (1989). "Representation Properties of Networks: Kolmogorov's Theorem is Irrelevant". Neural Computation. 1 (4): 465–469. doi:10.1162/neco.1989.1.4.465.
  13. ^ Kůrková, Věra (1991). "Kolmogorov's Theorem is Relevant". Neural Computation. 3 (4): 617–622. doi:10.1162/neco.1991.3.4.617. PMID 31167327.
  14. ^ Lin, Ji-Nan; Unbehauen, Rolf (January 1993). "On the Realization of a Kolmogorov Network". Neural Computation. 5 (1): 18–20. doi:10.1162/neco.1993.5.1.18.
  15. ^ Köppen, Mario (2022). "On the Training of a Kolmogorov Network". Artificial Neural Networks — ICANN 2002. Lecture Notes in Computer Science. Vol. 2415. pp. 474–479. doi:10.1007/3-540-46084-5_77. ISBN 978-3-540-44074-1.
  16. ^ KAN: Kolmogorov-Arnold Networks. (Ziming Liu et al.)
  17. ^ Manon Bischoff (May 28, 2024). "An Alternative to Conventional Neural Networks Could Help Reveal What AI Is Doing behind the Scenes". Scientific American. Archived from the original on May 29, 2024. Retrieved May 29, 2024.
  18. ^ Steve Nadis (September 11, 2024). "Novel Architecture Makes Neural Networks More Understandable". Quanta Magazine.
  19. ^ Morris, Sidney (January 2021). "Hilbert 13: Are there any genuine continuous multivariate real-valued functions?". Bulletin of the American Mathematical Society. 58 (1): 107–118. doi:10.1090/bull/1698. ISSN 0273-0979.
  20. ^ Sternfeld, Y. (1985-03-01). "Dimension, superposition of functions and separation of points, in compact metric spaces". Israel Journal of Mathematics. 50 (1): 13–53. doi:10.1007/BF02761117. ISSN 1565-8511.
  21. ^ Levin, Michael (1990-06-01). "Dimension and superposition of continuous functions". Israel Journal of Mathematics. 70 (2): 205–218. doi:10.1007/BF02807868. ISSN 1565-8511.
  22. ^ Torbjörn Hedberg, The Kolmogorov superposition theorem; appendix 2 in topics in approximation theory, (ed. Shapiro, Harold S.), Lecture Notes in Math., Springer, Heidelberg vol. 187 (1971), 267–275.

Sources

[edit]

Further reading

[edit]
  • S. Ya. Khavinson, Best Approximation by Linear Superpositions (Approximate Nomography), AMS Translations of Mathematical Monographs (1997)