Imagen (text-to-image model)

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Imagen
DeveloperGoogle DeepMind
Initial releaseMay 2022; 3 years ago (2022-05)
Stable release
Imagen 4 / 20 May 2025; 11 months ago (2025-05-20)
Repository
  • {{URL|example.com|optional display text}}Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
Engine
    Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
    TypeText-to-image model
    WebsiteImagen website

    Imagen is a series of text-to-image models developed by Google DeepMind. They were developed by Google Brain until the company's merger with DeepMind in April 2023.[1] Imagen is primarily used to generate images from text prompts, similar to Stability AI's Stable Diffusion, OpenAI's DALL-E, or Midjourney.

    The original version of the model was first discussed in a paper from May 2022.[2] The tool produces high-quality images and is available to all users with a Google account through services including Gemini, ImageFX, and Vertex AI.[3]

    History

    [edit | edit source]

    Imagen's original version was first presented in a paper published in May 2022. It featured the ability to generate high-fidelity images from natural language.[2] The second version, Imagen 2 was released in December 2023.[4] The standout feature was text and logo generation.[5] Imagen 3 was released in August 2024.[6] Google claims that the newest version provides better detail and lighting on generated images.[7] On 20 May 2025 at Google I/O 2025 the company released an improved model, Imagen 4.[8]

    Technology

    [edit | edit source]

    Imagen uses two key technologies. The first is the use of transformer-based large language models, notably T5, to understand text and subsequently encode text for image synthesis. The second is the use of cascaded diffusion models providing high-fidelity image generation. Imagen generates image in three stages, starting from a base of 64x64, then upsampled to 256x256 and 1024x1024.[2] Imagen 4 generates image up to 2k. [9]

    Capabilities

    [edit | edit source]

    Imagen can generate photorealistic images from text prompts.[3] It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. Like most text-to-image generative AI models, Imagen has difficulty rendering human fingers, text, ambigrams and other forms of typography.

    The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts.[7]

    See also

    [edit | edit source]

    References

    [edit | edit source]
    1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    2. ^ a b c Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    3. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    7. ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    8. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    9. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    [edit | edit source]