Good news! This seems to be a remarkable improvement that these latest models can be run on laptops or even smartphones!
The race of faster, cheaper, and better AI generated images is on!
However, this paper is actually old news, it appeared already in October 2024. It is accepted for the upcoming ICLR 2025. And Google Scholar shows only 21 total citations, not exactly awesome.
From the abstract:
"We introduce Hybrid Autoregressive Transformer (HART), an autoregressive (AR) visual generation model capable of directly generating 1024x1024 images, rivaling diffusion models in image generation quality. Existing AR models face limitations due to the poor image reconstruction quality of their discrete tokenizers and the prohibitive training costs associated with generating 1024px images.
To address these challenges, we present the hybrid tokenizer, which decomposes the continuous latents from the autoencoder into two components: discrete tokens representing the big picture and continuous tokens representing the residual components that cannot be represented by the discrete tokens.
The discrete component is modeled by a scalable-resolution discrete AR model, while the continuous component is learned with a lightweight residual diffusion module with only 37M parameters.
Compared with the discrete-only VAR tokenizer, our hybrid approach improves reconstruction FID from 2.11 to 0.30 on MJHQ-30K, leading to a 31% generation FID improvement from 7.85 to 5.38. HART also outperforms state-of-the-art diffusion models in both FID and CLIP score ..."
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer (open access)
Researchers combined two types of generative AI models, an autoregressive model and a diffusion model, to create a tool that leverages the best of each model to rapidly generate high-quality images.
No comments:
Post a Comment