Friday, March 21, 2025

AI tool generates high-quality images faster than state-of-the-art approaches

Good news! This seems to be a remarkable improvement that these latest models can be run on laptops or even smartphones!

The race of faster, cheaper, and better AI generated images is on!

However, this paper is actually old news, it appeared already in October 2024. It is accepted for the upcoming ICLR 2025. And Google Scholar shows only 21 total citations, not exactly awesome.

From the abstract:
"We introduce Hybrid Autoregressive Transformer (HART), an autoregressive (AR) visual generation model capable of directly generating 1024x1024 images, rivaling diffusion models in image generation quality. Existing AR models face limitations due to the poor image reconstruction quality of their discrete tokenizers and the prohibitive training costs associated with generating 1024px images.
To address these challenges, we present the hybrid tokenizer, which decomposes the continuous latents from the autoencoder into two components: discrete tokens representing the big picture and continuous tokens representing the residual components that cannot be represented by the discrete tokens.
The discrete component is modeled by a scalable-resolution discrete AR model, while the continuous component is learned with a lightweight residual diffusion module with only 37M parameters.
Compared with the discrete-only VAR tokenizer, our hybrid approach improves reconstruction FID from 2.11 to 0.30 on MJHQ-30K, leading to a 31% generation FID improvement from 7.85 to 5.38. HART also outperforms state-of-the-art diffusion models in both FID and CLIP score ..."

AI tool generates high-quality images faster than state-of-the-art approaches | MIT News | Massachusetts Institute of Technology "Researchers fuse the best of two popular methods to create an image generator that uses less energy and can run locally on a laptop or smartphone."

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer (open access)


Researchers combined two types of generative AI models, an autoregressive model and a diffusion model, to create a tool that leverages the best of each model to rapidly generate high-quality images.


No comments: