What is the face of the man behind the apple? For almost 60 years, the figure wearing a sombre suit and bowler hat in René Magritte’s painting “The Son of Man” has been obscured by a polished green apple. His facial features were intended to remain a mystery, the fruit an artistic provocation. Today, using new technology, 23-year-old digital artist Josephine Miller can roll the apple away.
Miller tilts her laptop towards me in the hushed café of the British Library in London to show how she used Dall-E 2, software that generates images using artificial intelligence (AI), to remove the fruit. Behind it is a man who looks startled to be suddenly revealed, eyebrows raised and piercing blue eyes staring out over an expertly waxed moustache. The face is painted in Magritte’s somewhat flat style and signature palette, as if the two images were painted by the same hand, side by side.
It’s a neat trick. Then Miller shows me she has generated not one but 200 possible faces. Magritte, a trickster at heart, probably would have approved. The technology, which can create near-infinite artistic combinations in response to a few words or images, has enabled Miller to do work that would have either taken months with previous tools or might not have been possible at all. It is dizzying in both its capabilities and its ethical implications. I ask if she finds it overwhelming. “No,” she says immediately. “Well, maybe it is for some people, but I’m just excited.”
Dall-E 2 is just one of several AI image-generation tools that have become available to the public this year. Since the spring, the internet has experienced a Cambrian explosion of every conceivable application of the technology. The only thing more amazing than the tech itself is the wild leaps of imagination of its users: Nosferatu in RuPaul’s Drag Race, Da Vinci’s “The Last Supper” but the apostles are crowding round to take a selfie, the French Revolution as seen from the perspective of a helmet-mounted GoPro camera, a bottle of ranch dressing testifying in court. All of these can be produced in less than a minute without much technical expertise.
And the technology is advancing swiftly. Six months ago most tools struggled to create human faces, usually offering grotesque combinations of eyes, teeth and stray limbs; today you can ask for a “photorealistic version of Jafar from Disney’s Aladdin sunbathing on Hampstead Heath” and get almost exactly what you’re looking for.
All of which is to say this is a pivotal moment in the history of art. AI-generated imagery “is a major disruptive force, and there will be both democratic and oppressive aspects to it”, says British artist Matthew Stone, who used Dall-E 2 in the process of creating artworks for his latest exhibition. Millions of images swarm out of this Pandora’s Box every day and, with them, a number of difficult questions about plagiarism, authorship and labour. Perhaps the biggest of all: is this the end of human creativity?
One of the first things any evangelist will tell you about AI image generation is how easy it is to do. You describe an image using natural language, as you would when talking to another person, and the software serves up several results in a matter of seconds.
Midjourney, a Dall-E rival, offers a free trial accessible via the chat application Discord. Hearing that it excels at images that have a more painterly style, I decide to try and make illustrations for a children’s book I’m working on, about a cat adventuring around the Mediterranean seeking its missing owner. I type in the prompt for my first idea:
/IMAGINE: GINGER CAT AT THE TOP OF A MINARET IN ISTANBUL
The image develops before my eyes like a photograph in a chemical bath, starting out as a blur and gradually gaining definition and coherence.
The first result is not great. The AI has given me a generic tower rather than a recognisable minaret. There is no sense that we are in Istanbul and, worst of all, the cat’s face is grotesquely embedded into the brickwork of the tower itself. This is my first lesson of AI image generation: although the pictures shared on social media often look fantastic, in-progress results can be terrible — ugly, generic or barely resembling an even simple prompt.
Since the free trial is located on a public chat server, my cat-minaret is quickly lost in a ceaseless flow of other people’s prompts and images. I watch what they are typing to try to glean some tips. It seems that the more detailed your prompt, the better the results. Several users keep returning to the same idea, tweaking words and phrasing to improve their results. One person keeps iterating on the idea of an “emotional support limpet” and, with each new version, the aquatic snail gets cuter.
I return to my cat prompt and add more detail:
/IMAGINE: GINGER CAT LOOKING WISTFULLY OVER A VIEW OF ISTANBUL FROM THE TOP OF A MINARET WHILE THE SUN SETS, ANIME STYLE
This generates a marked improvement — there’s a gorgeous contrast between rusty orange and deep indigo in the sky, with pointed minarets like needles scratching the rose-hued clouds. Yet the cat is still not right. In one version, it towers over the architecture like an adorable Godzilla. In another, it is normal sized but for some reason white, as if the sunset has leached out all of its colour.
I scrap the cat and go for something more artistic:
/IMAGINE: CARNIVAL CELEBRATION, BEAUTIFUL, GEORGES SEURAT
This composition has a real sense of festivity, but the AI didn’t get the pointillist style I was hoping to draw from the “Seurat” reference. I try the same prompt with the word “pointillism” and strike gold, with a soft-hued abstraction of clown-like figures at a fairground. There is a clicky, game-like satisfaction to plucking a random sentence from your imagination and seeing how the AI deals with it, and I spend hours testing out all manner of prompts.
Everyone I show Dall-E 2 and Midjourney to is amazed. This technology has an immediate, visceral impact, especially when people get to see their own ideas being conjured out of abstract mathematical space. “For there to be no gap between doing something and then seeing something, just waiting seconds like a webpage loading, and unexpected imagery simply arises rather than being the output of a long, arduous process,” says Stone. “It feels close to dreaming.”
It also feels like magic, but it isn’t. Joanne Jang, the product manager for Dall-E 2, explains how the AI works. Dall-E 2 is trained on 650 million images and their descriptive captions. It learns concepts from them like an infant looking at flashcards. If you show the AI enough photos of yoga, it will infer that the practice includes various poses as well as common accompanying objects, yoga mats, cork blocks and so on. If you show it giraffes, it will understand that these animals have long necks and patterned skin. Once the concepts are understood, you can ask it to generate an image of “a giraffe doing yoga” and it can do it, even if such an image has never existed before.
David Holz, Midjourney’s founder, explains the technology in more granular detail. The tool needs to solve three problems, he says: How does language relate to images? What should the images actually look like? Finally, and most difficult, is a more human question: What do people want to see? The ability to answer these questions was brought about primarily by the confluence of two technologies. One was a neural network called CLIP which could grasp the relationship between language and images. The other was a series of image-generation models that are improving at a rapid rate.
The first public outing for images generated by AI in response to a language prompt was the announcement of the original Dall-E in January 2021 by OpenAI, a research company based in Silicon Valley with close links to Microsoft and which counts Elon Musk among its founders. Dall-E generated images using technology that functioned like autocomplete on smartphones, creating pictures by using probability to decide which pixels should come next based on what came before. Dall-E 2, its successor, takes a different approach called a diffusion model, which generates pictures from image noise (essentially, a field of random pixels like static on a television) that are far more accurate, coherent and beautiful than before.
Other companies and independent developers began to use the diffusion model to make their own AI image-generation tools, each with its own quirks. Dall-E 2, whose name is a portmanteau of Pixar’s WALL-E robot and the artist Salvador Dalí, has one million active users and is generally thought to excel at realistic images and photographs. Midjourney has a more abstract, artistic style that users have found particularly good for making fantasy-, sci-fi- and horror-themed images. An open-source alternative called Stable Diffusion is one of the most popular among designers and artists; there is also Craiyon, a free public tool with a lower-quality output largely used for making memes, and ruDall-E, aimed at Russian users.
Several of the big tech companies have announced that they are working on their own versions, but the majority are not publicly available. Google has a tool called Imagen, Meta has Make-A-Scene, which allows users to upload a sketch to guide the AI, and Microsoft has NUWA-Infinity, which boasts a remarkable feature that can transform a still image into a video.
Many people will first encounter AI-generated images in the form of memes on social media. For professional artists, Holz says the main application of Midjourney is to supplement areas in which they feel weak, backgrounds, colour choice or composition, for instance. On the Discord server many users are, like me, not artistically gifted but feel overjoyed to participate in the creation of something beautiful, even if it’s just a Rubik’s Cube made of a peanut butter and jelly sandwich. Some are creating art for personal projects for which they would never have been able to afford a concept artist. Several users told me they felt the technology democratised the creation of professional images, no longer held back by lack of funding or technical skill.
Diego Conte Peralta is a computer graphics artist based in Madrid who has used Dall-E 2, Midjourney and Stable Diffusion, as well as training his own customised models. He shares his screen over Zoom to show me the digital whiteboard where he meticulously annotates his experiments. Each set of generated images is neatly labelled with the prompts used to create them. At first glance, it resembles a TV detective’s cork board of suspects and clues.
As we move rightward across the board, I can see how he iterates prompts and how the images change accordingly. One image resembles a painting of a man with his eyes closed wrapped in plastic sheeting. The prompt for this was “expressive face of a sleeping young male model wrapped in translucent plastics, dark background, dramatic lighting, 50mm”. Peralta describes the prompt’s evolution, which first gave him generic-looking figures bearing sterile expressions. He experimented with giving the man different ethnicities with interesting results, but it was only when he thought of adding the word “sleeping” that he got the serene, slightly eerie expression that he wanted.
Peralta then edits the results rather than treating the AI-generated image as a finished product. “That’s much more interesting for me because you can go places that even the AI cannot, and the output still has a human element,” he says. Next, he shows me a series of smudged, ominous portraits created by an AI model that he trained on paintings by Velázquez and Rembrandt. He has taken elements of their work and used them as textures in his own digital creations. “The AI gives me a sample so good that it’s almost a song,” he says.
Recently he has been asking the AI to make portraits in the style of an oil painting, then using these as sketches to paint with real oils on canvas. Studying the output of Midjourney and Dall-E 2 has taught him new painting techniques, helping him to perfect the nuances of blending colours or showing lighting on faces. “I see it as something between a tool and a resource,” Peralta says. He finds the technology liberating in how it allows him to iterate so quickly. Even if the results are not all good, he says he can find something he wants to move forward with. “It’s a small universe where you can do anything you want without production costs or limitations.”
Peralta’s artworks, made by AI trained on millions of human-created images, raise the question: Who is really the artist? Is it the person using the AI tool? The people who programmed it? Or is the creator now a distributed entity, spread among the countless artists and photographers who made the pictures that trained the AI? Most artists I spoke to who use AI image generation seem happy to call the output their own, but critics argue that this is not art in any sense we have previously understood the term.
This year the prize for digital art at the Colorado State Fair went to Jason M Allen, who made his work “Théâtre D’opéra Spatial” on Midjourney. The piece evokes a fantasy throne room where women in ochre robes sit before a portal to a wintry mountainscape. The award sparked uproar among artists, many of whom claimed he didn’t actually make the work. But Allen was unrepentant, arguing that he had clearly disclosed how he generated the image and that he had broken no rules. The event was an early test of how the wider art world might view AI images in the future.
Back in London, Stone has been asking Dall-E 2 to generate variations on his own artworks, using images as prompts rather than text. He says that the AI is not yet good enough to create art that he would happily share or sell without doing a lot of work to it. Yet when I ask whether he would consider even the raw output his own art, he is unwavering. “If I claim it as such, then yeah,” he says. “If there is a grand narrative of art history, then it’s about freedom and artists establishing that they can do whatever they want in whatever way.”
Situating AI image generation in this lineage brings to mind Marcel Duchamp and Andy Warhol, who revolutionised contemporary art by appropriating objects designed by other people, recontextualising them and claiming them as their own. They shifted the needle of what constitutes artistic value away from what you made with your hands or how much time and skill you put into it. Their artistic currencies were concept and storytelling.
Still, most of the creativity in AI image generation is in crafting your prompt. This has prompted the suggestion that making AI art is a process of curation rather than creation. But there has always been an editorial component at the core of the artistic process. “Even if I start with a clear intention for what I want to create, usually in the process of doing that something happens that throws up an unexpected outcome,” says Stone. “So I feel my role is recognising those moments, zooming in and understanding why a particular image has become exciting, then choosing to repeat, explore and go deeper with it. It’s almost like true creativity is [an] accident, and AI helps us become accident-prone by throwing up things that we may not have expected.”
It’s hard to get away from the humanity in all this. None of these tools can be operated without a human user (for now, at least). They have no will, agency or even memory. The same prompt will get a different result each time. “We need to promote the idea that when we use the digital — because . . . it’s very much part of our lives — there’s the potential for it to hold all the subjective, wonderful messiness of being human,” says Stone.
In AI-generated images, much of this “wonderful messiness” comes from the verbal prompts people input to create pictures. Where we once communicated with computers using code; now they are increasingly learning our language. Speak to them as you would another human, and they are more and more likely to understand what you mean. But we’re not quite there yet, and each tool still has a particular way of understanding words, which is why many people’s first experiments fall flat.
Learning the somewhat warped language of image generators has given birth to a new field called “prompt engineering” or “prompt craft”. Miller, the artist I met at the British Library, says you have to be specific with prompts to get the best out of the tools. She made a short guide for herself to include the following details: “What? Inspired by? Describe the environment? Feels like? What colours? Any adjectives? Which medium?”
Sometimes a slight quirk of phrasing can confuse the AI. Trying to generate a monster worthy of a horror movie on Midjourney, I typed in “man with pig face, HR Giger”, name-checking the Swiss artist known for his grotesque biomechanical designs which included the Xenomorph creature in Alien. The results accurately imitated Giger’s gloomy, hyper-detailed style, but they all inexplicably featured the face of the same jowly man. After some googling, I realised that Midjourney had understood that I wanted Giger’s actual face with a few porcine flourishes. When I changed the prompt to “ . . . in the style of HR Giger”, it produced exactly the chilling imagery I was aiming to make.
Crafting good prompts is a learning curve, partially because the AI is trained on image captions known as “alt text”, which are detailed literal descriptions of web images provided for visually impaired internet users and used by search engines. The result is that sometimes you have to get more specific than you would with a human interlocutor.
On Midjourney, I see a prompt that reads: “a majestic throne room, at the dawn of time, glass paint, overglaze, ornament, time-lapse, photojournalism, wide angle, perspective, double-exposure, light, tones of black in background, ultra-HD, super-resolution, massive scale, perfectionism, soft lighting, ray tracing global illumination, translucid luminescence, crystalline, lumen reflections, in a symbolic and meaningful style, symmetrical –q 5 –s 4975 –chaos 15 –ar 16:9”. As language, this is absolute nonsense. But the results are stunning.
There’s a knack to writing good prompts. On a website called PromptBase, people are buying and selling them as a new creative service. “I think consulting for prompts is going to be a job in the future,” says Miller. “I already know people who have made money from it.” But the creators of both Midjourney and Dall-E 2 tell me they want to move away from garbled unnatural language, that these tools should learn to understand humans better, not the other way around.
The fact that language is at the heart of a visual tool might seem surprising, but AI image generation is actually about communication as much as it is about pictures. Teaching computers to understand human language is central to all of OpenAI’s projects. The company’s first two commercial products before Dall-E were GPT-3, a language model which can generate coherent text, and Codex, which generates computer code in response to natural language prompts.
Midjourney’s founder Holz tells me that AI researchers are beginning to suspect that computers might learn to understand languages and images better in tandem than separately. “Language is very intimately connected to images because it was created . . . to describe the world around us,” he says. “So when you talk to AI and make images, you’re converting spoken language into visual language. Rather than creating art, you’re converting from one language to another, like Google Translate.”
Create AI-generated art yourself
We’re gathering our readers’ AI-generated artworks, and we may publish a selection of the best on FT.com and on our social media channels. Join in by posting your image to Instagram using the hashtag #ftaiart, tweeting us @FTMag or emailing us at [email protected], making sure to include the prompt you used and the name of the AI tool.
While he is careful always to refer to Midjourney as a tool rather than a conscious entity, I note that even Holz occasionally uses verbs like “understands”, “thinks” or “talks” when referring to the AI, words that imply consciousness, as if we lack a language to describe this new relationship. (Midjourney deliberately avoids personification, choosing a brand icon that is a boat rather than a robot face.)
Peralta takes a similar view. “AI right now is all about statistics,” he says, as he shows me some of his AI-generated portraits. “This nose is a statistical feature, not a nose. When you understand that as an artist, you use the tool in a more profound way than when you try to talk to it like a human being. Through the prompt, you’re accessing a specific sample of possible features and getting a random distribution of them in an image.”
And yet it is tempting to personify AI. When Dall-E 2 responded to my prompt “a picture frame made of ice” with a wooden picture frame containing three stacked ice cubes, I felt a sudden urge to affectionately ruffle its circuit boards and murmur: “Oh, I see why you did that. Don’t worry, you’ll learn.”
Even if we treat AI as nothing more than a tool, it can still play an emotive role in our lives. Holz and Jang were both surprised by how many people use AI image generation as a form of therapy, making pictures of their dog in heaven after losing a pet or entering lines from a deceased family member’s poetry to explore what their inner visual world might have looked like.
Most intriguing is the technology’s capacity to serve people with aphantasia, a condition which connotes an absence of mental imagery, thought to affect up to 5 per cent of the world’s population. Several aphantasics have contacted Jang at OpenAI to say that Dall-E has been invaluable for them in finally understanding how most people see the world.
Joel Pearson, a neuroscientist who has studied aphantasia, says that the absence of mental imagery can change how people emotionally respond to stimuli. A book with descriptive prose, for instance, might be less enjoyable if you can’t visualise its scenes. He has been exploring the possibility of an AI image assistant for aphantasics which could, for example, be embedded into an ereader to automatically generate illustrations on each page, almost like a prosthetic visual imagination.
Since the AI is trained on images pulled from the internet, it learns from a store of pictures that people have chosen to replicate and share because they are deemed meaningful or useful. One thing this reveals is just how deeply social bias is baked into our data sets. OpenAI noted that if you type in the word “nurse”, Dall-E 2 would always show a picture of a woman, while a “CEO” would always be a white man.
In a recent update, the company tried to increase the diversity in generated images by randomly adding race and gender descriptions in prompts where they are not already specified by the user. When Midjourney surveyed users about whether they wanted the tool to randomly change the ethnicity and gender of humans in generated images to maximise diversity, the answer was overwhelmingly negative. Respondents said this would feel like their authorial control was being taken away.
Another minefield is content moderation. OpenAI forbids the generation of nudity, violence, political campaigning and public figures. (Prompting Dall-E 2 with “Liz Truss and Boris Johnson hugging it out” yields an error message.) Midjourney has banned certain prompt words to stop people from making violent images. “You can’t use the word ‘art’ to justify everything in all situations,” Holz says. “People were making the visual equivalent of hate speech, and we’d say they weren’t allowed. They’d reply: ‘What are you, a cop? I’m an artist. I should be able to do whatever I want.’ And it’s like, maybe not actually.”
More concerning in the long term is the power these tools have to generate misinformation. The general visual literacy of the public is not high. In a test of about 600 respondents, 80 per cent of respondents were unable to recognise an AI-generated photograph, and 60 per cent failed to identify an AI-generated artwork, according to Tidio, a customer service platform. As it becomes easier to create convincing photographs for the purposes of misinformation, the value placed in images as proof in courtrooms or the media may be forced to shift.
Aside from the more theoretical concerns around misinformation, there are more tangible threats this technology is already posing to the lives of working artists and designers. Copyright first and foremost. These models were trained on human creations, but those creators were never asked for consent or compensated. A group called Spawning have already launched a tool, Have I Been Trained, which allows artists to see if their images have been used to train AI systems.
Several services, including Dall-E and Midjourney, are now giving premium subscribers the commercial rights to the images they create. And some digital libraries, such as Getty Images, have banned the sale and upload of AI-generated pictures, citing legal concerns. Over the coming years, we can expect court cases to set precedents on these questions as the law scurries to catch up with the pace of technological development.
More of an existential threat is the question of what this AI will mean for the already precarious livelihoods of artists and designers. The optimistic take is that it might automate the mundane side of graphic design work, allowing artists more space to focus on their creative projects. Miller is philosophical on the topic: “Yes, it’s going to kill jobs but, at the same time, jobs have been dying out since the industrial revolution. Jobs evolve because of technology. My job didn’t exist five years ago.”
Several artists are less positive. I hear numerous stories of designers whose work was rejected when their client found out that they could use Dall-E 2 to get a much cheaper result that might not be as good, but was good enough. Even OpenAI chief executive Sam Altman wrote in a blog post that, while AI will create new jobs, “I think it’s important to be honest that it’s increasingly going to make some jobs not very relevant.”
“I had an existential crisis for the first two weeks when I started using Dall-E,” says Los Angeles-based digital artist Don Allen Stevenson III. The technology prompted him and his fiancée, who is also an artist, to rethink their life plans so they would not be financially dependent on their art. “I think it’s over for the old ways. There’s no way that companies are going to prioritise the value of artists over capital. Artists have to get themselves into a position where they can change and adapt or else they’re going to go extinct.”
Meanwhile, the technology is developing apace. “AI is in its infancy,” says musician and digital artist August Kamp, “and it’s a very smart baby.” Within a year, Holz expects we will see tools that can create 3D models and video as easily as Dall-E 2 and Midjourney create images. He calls this “a technological certainty”. Over the following decade, these tools will become better, cheaper and more accessible until they are “a seamless part of our everyday lives”. It’s easy to imagine that AI image generation could be embedded into social networks to become a new unit of communication between friends, as commonplace as emojis or gifs. There’s already a basic AI art filter available on TikTok.
All this disruption does not necessarily spell the death of human creativity. When the camera was invented, some declared it the end of art, arguing that since taking a photo required less effort and skill than painting, it was the device, not the human, that was responsible for the final image. Today most people acknowledge that fine art photographers are fine artists by dint of the choices they make and how they use their tools.
The history of art is intertwined with the history of technology. Oil painting was a new technology once, as were recorded sound, cinema and electronic music synthesisers. Each threatened to make a previous art form irrelevant, but this never really happened. People still paint with oils and learn to play the guitar. Copyright and payments will cause arguments, jobs will come and go, ethics will provide endless fuel for debate, but art itself is too vital to be killed by new technology. Whenever it seems threatened, it’s only a matter of time before it generates something new.
Follow @FTMag on Twitter to find out about our latest stories first
We’re gathering our readers’ AI-generated artworks, and we may publish a selection of the best on FT.com and on our social media channels. Join in by posting your image to Instagram using the hashtag #ftaiart, tweeting us @FTMag or emailing us at [email protected], making sure to include the prompt you used and the name of the AI tool