Text-to-Image Generation: Creating Art, Debate, and Possibilities

5 min read


For most of recorded history, the only way for you to see a picture of—oh, say, cats building a pyramid of cheese—was to either hire a talented, nonjudgmental friend to draw it for you or to sharpen your pencil and take a swing at it yourself. But as of right now, all it takes to realize your vision is to find the right website, type in a well-phrased description, and wait about a minute and a half for your results. 

What makes this possible? Artificial intelligence text-to-image generators, a machine learning technology that lets anyone with a computer create convincing images using only their imaginations and a text prompt.

Be This Witchcraft?

Artificial intelligence text-to-image generation isn’t all that new. But what is new is the quality of the images being generated.

It starts with natural language processing, or NLP. This is the branch of artificial intelligence (AI, to its friends) that gives computers the ability to understand text and spoken words. Once a text description is entered, software tools such as Midjourney, Stable Diffusion, DALL-E 2, Imagen, Craiyon and others use machine learning algorithms (like deep learning) to convert descriptions into images. Reading the word “cat,” a machine would examine multiple images of cats, and then use those images as inspiration for its new creation.

Those of you with computer degrees may be reading this right now, gesturing impatiently, and grumbling, “Yes, yes, of course.” Others of you—possibly those with ill-defined humanities degrees—may be fervently crossing yourselves and murmuring, “Yes, yes, this is definitely witchcraft.”

Opportunities for Marketing and Advertising

At the moment, any number of people are using AI text-to-image generators as a novelty: commissioning everything from a vampire Corgi, to a basketball-playing goblin in the style of Van Gogh, to Darth Vader eating breakfast cereal, to—well, you name it.

But the technology is getting better and better. Google’s Imagen, for instance, is now producing images with a photorealistic level of polish. For a business such as advertising, which leans heavily on creative ideas and artistic execution, it’s becoming evident that this technology has the potential to influence our entire industry.

In many ways, AI text-to-image generators can be model agency employees: working around the clock, unflinchingly accepting creative direction, and hardly ever leaving their unwashed dishes in the agency sink. And there are many ways this technology could help creative teams.

  • Inspiration and Ideation. Different phrasings of different text prompts can cause AI text-to-image generators to interpret creative directions in multiple ways—some of which might never have occurred to the person typing the prompt.
  • Creating Pitch Decks. By quickly creating mock-ups, storyboards, and potentially even animations, AI text-to-image generators can do work that sells concepts to clients—and can do so using any illustration style desired.
  • Efficiencies in Cost and Time. AI text-to-image generators could act as electronic assistants for art directors and designers, helping them to do more work in fewer hours. The technology could also be an economical solution for smaller agencies and businesses who may not have the budget for custom illustrations or photo shoots.
  • Jobs of the Future. Before AI text-to-image generators can make images, they need to be told what to create. Writing effective text prompts could soon develop into a specialized skill—and it’s easy to imagine some agencies creating a new “AI Whisperer” position, for strategic thinkers who can blend the communication talents of a writer and an artist. 

Text-to-Image Technology Also Generates Questions

An AI text-to-image generator, like any other technology, is inherently neither good nor evil. That said, the tool could have some potential downsides—and it’s only fair that we consider those, too.

  • Absence of Emotion. Traditionally, art has been a way for artists to express something that’s personal and meaningful. Does art that’s commissioned by humans but executed by machines truly qualify as art? That question recently became more than just theoretical, as a piece created by an AI text-to-image generator won an art contest at the Colorado State Fair—raising a lot of eyebrows, and not a little ire.
  • Creative Appropriation. Sophisticated AI text-to-image generators can imitate the techniques of artists long dead. But they also have the potential to mimic the styles of illustrators who are very much still with us. If an AI art generator uses the visual creations of living artists to create a new artwork, is it a wholly new creation? Some have argued that this use crosses the line from inspiration into the appropriation of others’ intellectual property.
  • Potential Job Loss. Though physical artwork will never disappear, it’s been pointed out that a tool capable of generating art quickly and cheaply may reduce work for illustrators, photographers, photo retouchers, and other fine artists.

Our Take

Technological innovations have always reshaped workplaces. Bank tellers now work alongside ATMs, cashiers share their duties with self check-out lanes, and—unless you live in Oregon or New Jersey—service station employees no longer pump your car’s gas.

Advertising agencies have not been immune to change. In recent years, we’ve grown accustomed to creating websites, digital ads, social media posts, and other content that didn’t exist 30 years ago.

Here at Well Done, we plan to embrace AI text-to-image technology as we have every other new evolution: by studying it, mastering it, and then making smart, conscientious choices about its use.

Doing so will benefit our clients and their customers. Whereas doing anything else might make us as anachronistic as—help us out here, AI—a tired old horse on a shiny new car lot.