VibeToken-Gen: Dynamic Resolution Image Generation

Maitreya Patel, Jingtao Li, Weiming Zhuang, Yezhou Yang, Lingjuan Lyu  | 

CVPR 2026 (Main Conference)

🤗 Model  |  💻 GitHub

Generate ImageNet class-conditional images at arbitrary resolutions using only 65 tokens. VibeToken-Gen maintains a constant 179G FLOPs regardless of output resolution.

ImageNet Class

Pick a class or choose 'Custom' to enter an ID manually.

Generator Resolution

Internal resolution for the AR generator (max 512×512).

Output Resolution (Decoder)

Final image resolution. Set higher for super-resolution (e.g. generate at 256, decode at 1024).

Decoder Patch Size

'Auto' selects based on output resolution. Larger = faster but coarser.

Citation

@inproceedings{vibetoken2026,
  title     = {VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations},
  author    = {Patel, Maitreya and Li, Jingtao and Zhuang, Weiming and Yang, Yezhou and Lyu, Lingjuan},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}