I will begin as a curmudgeon and state that I think the term “artificial intelligence” is overly generous when applied to the current level of this technology, and would be better reserved for future systems that are more likely to fulfill the expectations created by the term “intelligence”.
In an effort to neatly sidestep the semantic/philosophical black hole that is the definition of “intelligence”, I will say that here I’m referring to the common perception most people have of the term — essentially, generalized human intelligence — the ability to think and reason as well as learn. In most cases, this carries an implied sense of self awareness, even when we assign the characteristic to other animals.
These “AI” systems — which I prefer to refer to simply as “machine learning systems” or “neural networks” — are indeed impressive in certain ways, but if they are intelligent, they are savants — extremely capable a few highly specialized areas.
However, I do wonder if they are adept in ways we haven’t likely considered, and I will pose the question: are neural networks capable of being creative?
How do machine learning text to image generators work?
There are three major text to image generators: Stable Diffusion, DALL-E, and Midjourney, as well a a number of less well known systems.
These systems are trained by being fed images — lots of images — from various sources. At present there is little restraint on sources. The images are tagged, or associated with text in some way, and accumulated into enormous datasets. The system can quickly access its dataset and identify various kinds of images by their associated words.
A user will log into a site from which one or more of the systems can be accessed, and give the system a text “prompt” from which to generate images. Typically, this is a set of words suggesting a kind of image, a subject and a style of rendering. The system will fly through its database of text-tagged images, compare them, and try to produce new images with a similar appearance based on the text cues.
There is a common misconception
that these systems cobble together bits of “stolen” images into new configurations, like a kind of blended collage. This is not how they work, and the way they do work is the crux of the copyright problem. The best descriptor I can think of for their actual training a output process is pattern recognition. They learn to associate certain kinds of visual patterns with text words or strings of words, and they then create brand new, unique images based on those patterns. The patterns they recognize and can replicate can be highly sophisticated, such as the style of a given artist, but the results do not directly contain copyrighted existing artwork. (There is an exception if you feed an existing copyrighted image into the system directly —as a basis from which to work — and then only modify it slightly, but that’s a different use than what I’m talking about here.)
Perhaps not as easy at it looks
In my attempt to learn more about text to image generators, and to understand how they work, I have been experimenting with the machine learning model known as Stable Diffusion 1.5.
I’m a reasonably accomplished artist, and in my role as a website designer and developer, quite comfortable with computers, to the point of writing certain kinds of code. However, my efforts in creating prompts for the system have produced rather mediocre results compared to some of the machine generated image examples I’ve seen. To me this indicates there is a significant level of effort and skill in crafting prompts that produce the best results from these systems.
The text prompt I used for the image above, left was: “beautiful young woman with straight red hair and bangs in front of elaborate art nouveau style decoration rendered in the style of alphonse mucha“.
For the image on the right it was: “beautiful young woman with straight, collar length bright red hair, bangs, green eyes, in the style of alphonse mucha“.
Though both images have a vague look of Art Nouveau, neither looks to me like the style of Mucha, nor do they quite fulfill the design feel of his posters that I was trying to achieve.
I’m hampered by the fact that I’m only writing prompts at the most basic level, and I have not learned the processes of iteration and other techniques more skilled users employ.
Eventually, I found that clicking on the original prompt text in the detail page for the image in my PlaygroundAI account profile would access a large number existing images posted by other users, presumably with prompts the system deemed related to mine.
Some of these were visually appealing, and obviously created by users with a greater degree of experience; others were aberrations that looked like an illustration for a science fiction story about a horrible teleportation accident. In each case, in these images the users had tagged as publicly viewable, the text prompt is there to read and learn from.
Are machine generated images that imitate the recognizable style of a contemporary artist theft? — or not?
When machine generated images seem to carry the style of a living artist, the “new images” part of that process is the at heart of a conundrum: if the generated images are not copies of existing copyrighted images, but are rendered in the recognizable style of an living artist (or a deceased artist whose works are still protected under copyright law), does this constitute theft?
Many of us will be quick to sail off to the conclusion that copying an artist’s style is theft, but on further reflection, will quickly run aground on the shoal of existing U.S. and international copyright law, which states that only existing works can be protected by copyright.
You cannot copyright a style.
However unethical it may seem, it is not against current law to copy a style, as long as the copyist is not misrepresenting the works as authentic works by the original artist.
What is more likely in question is the legality of the training methods of the machine learning models in “scraping” images from the web and other sources. So far they seem to be operating within generally accepted practices, as the “fair use” part of copyright law is, of necessity, vague.
Change the law?
The cry to “Change copyright law!” quickly runs into its own barriers. When given thought, (not a popular practice, I know), it becomes obvious this is not only a bog-like whirlpool of conflicting and amorphous concepts, it may well be an impossible task.
How would you go about defining copyright infringement of an artist’s style? In some of the most blatant cases, it seems obvious, but it’s in the dark, shifting fringes of this concept that the details, and the difficulties, lie.
As an artist, my own style is an accumulation of the influences I’ve encountered through my life — other artists whose work I have admired and, in many cases, studied.
If I admire the style of an artist whose work is under copyright — let’s say Alphonse Mucha — and I study his style and attempt to bring elements of it into my own work, at what point could I be accused of copyright infringement?
Can you see what a muddy slope this is already? How is this different from the history of art, in which artists have always learned from those who came before them?
Was Rembrandt guilty of theft in adopting the pose of a painting he admired by Titian?
(Image above, left: Man with a quilted sleeve, Titian 1510, right: Self portrait at the age of 34, Rembrandt 1640; note: these are images of the real paintings, not machine learning imitations)
Learning from those who came before us is how human endeavor, whether artistic, scientific, literary or otherwise, has always progressed. As has often been said: “We stand on the shoulders of giants.”
So in what fundamental and legally definable way is a machine learning system creating new images based on the accumulated observation of existing images different from humans observing, and learning from art they have been inspired by?
In what way is this aspect of machine learning different from what we identify in humans as creativity, which has always consisted of combining existing material in new ways?
These seemingly simple, but challenging, questions are worthy of consideration.
Capitalism rears its greedy, leering grin
I have not yet mentioned the inexorable forces of commerce and the fact that a number of powerful and influential companies have a stake in making the commercial versions of these systems as powerful as possible.
(Image above: Stable Diffusion 1.5, text prompt: “fierce, threatening monster robot”)
Even more to the point is the “cost cutting” pressure on companies to use these systems in lieu of hiring artists and graphic designers who must be paid for their work.
On the hopeful side, I’m reminded of the “desktop publishing revolution” of the 1980s and 1990s, during which companies decided that a computer with lots of fonts and Microsoft Word meant that Kevin in Accounting could take over the design and publishing chores for the company, and that hiring a graphic designer was no longer necessary.
Endless centered-text multi-font Word documents later, the companies realized this was indeed an error of judgement.
How different the current situation may be is as yet unclear, but at this time, companies would have to pay someone who is skilled at manipulating one of these systems to produce acceptable results; so as yet, this doesn’t seem to be a Kevin in Accounting push-button threat to graphic designers.
However, machine learning systems are disrupting more areas of human endeavor than the arts; word-based systems like ChatGPT and Open AI Playground (not to be confused with Playground AI.com) are being used to write advertising copy, blog posts, term papers and computer code, and will be sought after to take over a variety of other jobs.
You may have noticed the prevalence of fake humans keeping you from talking to a real person when you try to get “customer service” on the phone, the “convenient self service checkouts” that encourage you to do a checkout clerk’s job for free, as well as taking orders from a machine, and robot voices in other aspects of modern life. All of these will become more sophisticated — and harder to discern as non-human — as machine learning makes its presence felt.
Companies love the fantasy of having to pay no employee salaries or benefits, instead having machines fulfill their roles in selling goods and services to consumers (who, one assumes, will be paid salaries by other firms that are less savvy).
What’s an artist to do?
For those artists concerned with protecting their own style of art from being adopted by these systems, what options are available?
If we focus on the imitations of existing copyright law, we find that in the U.S. copyright generally covers works for 95 years after the publication date.
If you tried to define an artist’s style for the purpose of copyright law, not only would defining a style be a daunting challenge, but how would you enforce such a regulation?
Many artists are urging that you contact your legislative representatives and demand that they do “something”.
The idea of involving legislators in this process just makes my blood run cold. Never have I seen a group more monumentally and almost universally ignorant and misguided in issues of technology than legislators — not that this has ever stopped them from sticking their dirty fingers in the pie.
Could legal restrictions be placed on the types of content allowable for use by these systems in the training stage? Perhaps, but this is in itself a muddy issue, which may contain unintended consequences in the form of limitations on what we can access as humans. Can we really regulate machine access to images differently than what is available to humans?
The Concept Art Association it trying to rally the troops with a crowdfunding campaign and a list of suggested actions.
However, I think those suggesting that image collection for text to image generation be limited to opt-in, and otherwise restricted to public domain content, may find this is a more complex legal issue than it may seem at first, and are again casting wide nets that may well catch humans in unpredicted ways.
(Image above: Stable diffusion 1.5, text prompt: “fierce, threatening monster robot holding an artist’s palette and paintbrush”; image-to-image prompt: Self portrait by Élisabeth Louise Vigée Le Brun)
Leaving something like this in the hands of politicians would be at best ineffectual, and at worst disastrous. If there are solutions to be found, they must come from individuals who are intimately familiar with the complexities of the issues, the structure and use of these systems and the likely trajectory of their technological progression as well as the legal framework of copyright law.
At present there is some evidence that public opinion can have an effect on the creators of these systems. Already, Stability AI, the company behind the Stable Diffusion text to image generator, is offering an opt out to have your artwork excluded from the flood of images being fed into their training system for the next version of the software. This does, however, require that artists be proactive in opting out and requires an awareness of this option in the first place. Also, Stable Diffusion is only one of several systems in operation.
It’s worth noting that the creators of some of these systems are attempting to restrict the use of specific artists’ names in prompting the style of rendering.
There are also efforts being made to allow for digitally tagging images in a way that can be used to identify and exclude images from assimilation by the Borg, er,… I mean neural network training routines.
Meanwhile, putting the “NO AI” symbol up on social media accounts seems pretty weak sauce, though it may help raise a bit of awareness of the issue. (I can certainly understand the attempt to bring it to the attention of the owners of ArtStation.)
I will suggest, however, that artists would do well to raise their own level of awareness and become more informed abut the underlying technology and related copyright issues.
Being informed
I think that artists to whom this issue is important will benefit from taking a little time to log into one of these systems and spend a few minutes learning to write prompts, in order to understand what they do and how they are being used. It’s also worth noting how they can be individually further “trained” by uploading images from which the system can be prompted to create new variations.
If you can avoid a knee-jerk “I’m not having anything to do with this!” reaction, you can easily investigate image to text generation for yourself by going to Dream Studio, and creating an account, which requires only an email address. There, you will be able to use Stable Diffusion for free.
There is a 15 minute YouTube video here that will walk you through the process of creating prompts for these systems, as well as giving you a quick overview of their capabilities.
I’m not suggesting that you start using text to image generators going forward — or that a few minutes spent using one of these systems is likely to change your opinion — but I believe the experience will give you a better informed opinion.
It may also prompt you (if you’ll excuse the expression) to think about how you tag and classify your images when making them publicly available.
I will also suggest that artists will do well to become more informed about copyright, how it works, what its limitations are and what is meant by the public domain and fair use.
Can these systems be used ethically?
In my attempt to understand how these systems are trained to adopt a contemporary artist’s style, I tried to teach Stable Diffusion to imitate my own comics style by feeding it an image from my webcomic, Argon Zark! (image above, left) and playing with various text prompts. The results, though occasionally amusing, were far from successful.
That and my weak attempts to prompt the system to imitate the look of Alphonse Mucha convinced me that the image generator users who are successfully imitating a contemporary artist’s style are doing so not only deliberately, but with considerable effort and practice. If they are doing so to make money, this seems to me the focal point of unethical practice in this arena.
The loud voices in opposition to text to image generation in any form appear to assume that the only use of these systems is to appropriate without credit the hard work of living artists, ignoring the fact that there is a great deal of art, other images and writing that belongs in the public domain and is therefore fair game any way you look at it. If I ask a neural network to create an image in the style of Rembrandt, no one has reason to complain.
(Image above: Stable Diffusion 1.5, text prompt: “landscape etching in the style of Rembrandt”)
Where am I coming from, and where do we go from here?
For those of you who might assume from my reluctance to jump on the “text to image generation is the spawn of hell” bandwagon that I am a disinterested observer, I’ll point out that I’m a painter, illustrator, comics artist, and part-time art teacher, and the creator of intellectual property that I consider valuable.
Also, in my role as a graphic designer, I stand to lose business if these systems make website creation a job for neural networks rather than human designers.
I’m not without a stake in this discussion.
That being said, we have to acknowledge that this technology is here. It’s not going away, and it’s likely to rapidly become more sophisticated and effective in the near future.
We can rage against the machine, shake our fists at the sky and cry foul — and hide in our bunkers as Skynet becomes active — or we can turn around, examine the technology and its uses, and attempt to understand and adapt — and perhaps influence the outcome of some of these conflicts, or even find uses for some aspects of this technology in our own creative endeavors.
There may be no easy answers, but we can at least try to understand the questions.