Sunday, June 15, 2025

On Improving large language models with concept-aware fine-tuning. Really!

The abstract of this new paper suggests that ML & AI researchers are reinventing the wheel!

The long existing ambiguity about what tokens are in natural language processing is not helpful! Breaking up words into chunks may even be counterproductive! What unit in the spectrum between a single character, a word chunk, a whole word, a sentence or even a whole paragraph etc. should be used for the training of language models? A combination of fine-grained and coarse-grained units or hierarchical-level units etc.

Caveat: I have not read the paper.

From the abstract:
"Large language models (LLMs) have become the cornerstone of modern AI. However, the existing paradigm of next-token prediction fundamentally limits their ability to form coherent, high-level concepts, making it a critical barrier to human-like understanding and reasoning.
Take the phrase "ribonucleic acid" as an example: an LLM will first decompose it into tokens, i.e., artificial text fragments ("rib", "on", ...), then learn each token sequentially, rather than grasping the phrase as a unified, coherent semantic entity. This fragmented representation hinders deeper conceptual understanding and, ultimately, the development of truly intelligent systems.
In response, we introduce Concept-Aware Fine-Tuning (CAFT), a novel multi-token training method that redefines how LLMs are fine-tuned. By enabling the learning of sequences that span multiple tokens, this method fosters stronger concept-aware learning.
Our experiments demonstrate significant improvements compared to conventional next-token finetuning methods across diverse tasks, including traditional applications like text summarization and domain-specific ones like de novo protein design.
Multi-token prediction was previously only possible in the prohibitively expensive pretraining phase; CAFT, to our knowledge, is the first to bring the multi-token setting to the post-training phase, thus effectively democratizing its benefits for the broader community of practitioners and researchers. ..."

[2506.07833] Improving large language models with concept-aware fine-tuning

No comments: