Recommendable!
This paper, published only in April, seems to be on fire! Citation count as of today (8/13/2024): Google Scholar 93, Semantic Scholar 64. Semantic Scholar lists already several other papers published since then that are derived from KAN.
Caveat: I have not yet read this paper.
"... Researchers have now created a fundamentally new way to make neural networks that in some ways surpasses traditional systems. These new networks are more interpretable and also more accurate, proponents say, even when they’re smaller. Their developers say the way they learn to represent physics data concisely could help scientists uncover new laws of nature. ...
a new architecture on ... one designed from first principles. ...
For the past decade or more, engineers have mostly tweaked neural-network designs through trial and error ...
In traditional neural networks, called multi-layer perceptrons (MLPs), ... Each neuron then applies a simple function to the sum total of its inputs, called an activation function.
In the new architecture, the synapses play a more complex role. Instead of simply learning how strong the connection between two neurons is, they learn the full nature of that connection—the function that maps input to output. ... this function could be more complex—in fact a “spline” or combination of several functions—and is different in each instance. Neurons, on the other hand, become simpler—they just sum the outputs of all their preceding synapses. The new networks are called Kolmogorov-Arnold Networks (KANs), after two mathematicians who studied how functions could be combined. The idea is that KANs would provide greater flexibility when learning to represent data, while using fewer learned parameters. ...
a new architecture on ... one designed from first principles. ...
For the past decade or more, engineers have mostly tweaked neural-network designs through trial and error ...
In traditional neural networks, called multi-layer perceptrons (MLPs), ... Each neuron then applies a simple function to the sum total of its inputs, called an activation function.
In the new architecture, the synapses play a more complex role. Instead of simply learning how strong the connection between two neurons is, they learn the full nature of that connection—the function that maps input to output. ... this function could be more complex—in fact a “spline” or combination of several functions—and is different in each instance. Neurons, on the other hand, become simpler—they just sum the outputs of all their preceding synapses. The new networks are called Kolmogorov-Arnold Networks (KANs), after two mathematicians who studied how functions could be combined. The idea is that KANs would provide greater flexibility when learning to represent data, while using fewer learned parameters. ...
The researchers tested their KANs on relatively simple scientific tasks. ... They found that increasing the size of KANs improves their performance at a faster rate than increasing the size of MLPs did. When solving partial differential equations, a KAN was 100 times as accurate as an MLP that had 100 times as many parameters. ...
What’s more, the researchers could visually map out the KANs and look at the shapes of the activation functions, as well as the importance of each connection. Either manually or automatically they could prune weak connections and replace some activation functions with simpler ones, like sine or exponential functions. Then they could summarize the entire KAN in an intuitive one-line function (including all the component activation functions), in some cases perfectly reconstructing the physics function that created the dataset. ..."
In another experiment, they trained networks to predict one attribute of topological knots, called their signature, based on other attributes of the knots. An MLP achieved 78 percent test accuracy using about 300,000 parameters, while a KAN achieved 81.6 percent test accuracy using only about 200 parameters.
From the abstract:
"Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs."
KAN: Kolmogorov-Arnold Networks (open access)
Figure 0.1: Multi-Layer Perceptrons (MLPs) vs. Kolmogorov-Arnold Networks (KANs)
In traditional neural networks, sometimes called multi-layer perceptrons [left], each synapse learns a number called a weight, and each neuron applies a simple function to the sum of its inputs. In the new Kolmogorov-Arnold architecture [right], each synapse learns a function, and the neurons sum the outputs of those functions.
No comments:
Post a Comment