artificial intelligence Homepage

Can Artificial Intelligence be creative? An introduction to Generative Adversarial Networks

By on november 21, 2018

Take a look at the painting below. At a first glance, you could think that this painting is hanging in an expensive art gallery somewhere. If you squint your eyes, you might even believe that it was painted by one of the old masters, like Rembrandt or Vermeer. In reality, the painting, titled Portrait of Edmond Belamy, was generated by an Artificial Intelligence algorithm. It was recently sold at an auction in New York for a stunning $432,500, 45 times more than the estimated selling price. Obvious, the French art collective behind the painting, created the painting and others like it using a Generative Adverserial Network, a relatively new AI technique that pits two neural networks against each other in order to generate realistic fake data.

One of the strong suits of machine learning algorithms is that they can automatically find underlying commonalities in datasets that might not always be obvious to human observers. But can this knowledge also be employed to generate new data? Is it possible for a machine learning algorithm to create something realistic that it has not encountered before? Or in other words: can
machine learning algorithms be creative? This is where Generative Adverserial Networks, or GANs, come in. Yann Lecun, director of AI research at Facebook, called GANs one of the most important breakthroughs in deep learning of recent years. The concept of GANs was introduced in 2014 by Ian Goodfellow, a researcher at Google Brain. Like all good ideas, it started with a discussion in a bar. In this case, the discussion was about generative networks, and how their quality could be improved. Goodfellow came up with the idea to use an algorithm with two competing neural networks: one to generate fake data, and one to make the distinction between real and fake data. The idea got lodged in his head, and when he went home that same night, he did not go to sleep before he created the first working GAN. That same year, Goodfellow and fellow researchers expanded on the idea and wrote a paper to introduce the idea of GANs to the rest of the world.

Fake celebrity pictures generated by a GAN

The main concept behind a GAN is fairly simple. It generally consists of two neural networks: a generator and a discriminator. These networks are engaged in two competing tasks: the generator tries to generate new, fake data that is realistic enough to fool the discriminator, which tries to make an accurate distinction between real training data and fake data generated by the generator. The two networks are constantly updated to become better at their task, and they are, in a sense, training each other. The common analogy here is to compare this process to the counterfeiting of money. A criminal (the generator) tries to print realistic counterfeited money, while the police (the discriminator) try to differentiate the fake money from the real money. Over time, the criminal thinks of craftier ways to fool the police and to make the fake money as realistic as possible, while the police become better at finding the difference between real and counterfeited money.

If a GAN works as intended, the end result should be a generator that can generate very convincing fake data in a certain domain, and a discriminator that can accurately tell the difference between real and generated data, with good knowledge of the underlying representation of the data. A trained discriminator has learned general data features, and by combining the unsupervised training of a GAN with a supervised classification task, it is possible to effectively create high quality classifiers in a semi-supervised way. Most of the recent work with GANs has been done with visual data. Amongst other things, they have been used to generate images of bedrooms, to create faces of non-existent celebrities, to make realistic adaptations to images of human faces and to create horrific creatures that resemble cats. In other domains, GANs have combined musical pieces to create new compositions, and they can be used for fraud detectionIf you are looking for more examples, an attempt to keep track of all known named GAN implementations can be found in the GAN Zoo.

Inner workings of a GAN
By now, you should have at least a vague understanding of how GANs work. Two neural networks are competing in a game, where they each try to outsmart the other. As one of them gets better at its task, the other one has to try even harder to beat it. The generator usually gets noise or other (semi-)random data as input, which it somehow has to turn into fake data. The discriminator receives training data from a set of pre-selected examples, and an equal amount of fake data from the generator. For each data point it receives, it has to calculate the probability that it’s generated by the generator. The difference between the actual results and the most optimal results of both networks are used to update the networks, to make them better at their respective tasks.

Overview of GAN architecture

Even though GANs have brought along a small revolution in the quality of generative techniques, there are still some issues to keep in mind when training and designing a GAN. First of all, as with all machine learning techniques, the quality of the output is largely dependent on the quality of the training data. Although GANs are good at finding underlying patterns in the data, it becomes very hard when there is a lot of variation in the data. Too much variation in your training data will generally lead to inconsistent, low-quality results. Another known issue with GANs is mode collapse. Mode collapse occurs when the generator ends in a setting where it always emits the same point, or very similar points. Imagine a GAN that generates human faces. If the GAN always outputs the exact same, very realistic face, it might be very good at fooling the generator, even though that output might not be very useful. Fortunately, there are already some good solutions to combat mode collapse, like showing the discriminator multiple examples at the same time, rather than one at a time.

Mode collapse in a GAN that is trained on the MNIST database of handwritten digits

It can also be very complicated to fine-tune the parameters of a GAN, because you have to fine-tune the parameters of two separate networks that have a lot of interaction. It is important to find a good balance between the two, to make sure that one of the networks does not overpower the other and renders it obsolete. To combat this problem, it is common practice to keep the values of the generator static when you are training the discriminator, and vice versa. Furthermore, you can pretrain the discriminator on your training dataset, so it has a decent baseline performance.

A final issue is that the evaluation criteria for GANs are not straightforward. You could say that a GAN works well if the generator is very good at fooling the discriminator, but that might also mean your discriminator is under-performing. It is possible to let human annotators assess the quality of the samples, or to look at the error rate of human test subjects in a distinguishing task between generated and real data. Unfortunately, this task can be time-consuming, and human annotators can be very inconsistent. Some papers have produced quality metrics that do not rely on human annotators, but still have similar results.

Fake images of bedrooms created with a GAN

The future
Where do we go from here? Because GANs are a relatively new technique, there is still a lot of ground to gain in terms of quality, stability, training time and optimization. A lot of research is focused on the usage of GANs to solve a variety of problems, but it is also important to make sure that the fundamentals are in order. Another way to go forwards is to apply GANs to more than just the visual domain. GANs are already at the point where they can generate very good photo-realistic images, and there is no doubt that they will still find their way into applications like video and photo restoration or video game graphics. While this is certainly impressive, GAN research in, for example, the auditory domain is still scarce, even though there are very nice examples of GANs being used in audio separation and noise reductionGANs could also be applied to text, by combining it with Natural Language Processing (NLP) techniques. The big issue to overcome here is that GANs are only defined for real-valued, continuous data, and it is hard to represent text in a continuous way. Some efforts have been made to use GANs with text based data, but so far, the results have not been as impressive as in other domains.

Lastly, most GAN research focuses on the output of the generator, and its ability to create realistic fake data in different domains. The other obvious way to use GANs that has been less thoroughly experimented with is to take the trained discriminator and use it as the basis for a classification task. The research that has been done on this subject so far seems to show that it can be a reliable way to pre-train classifiers with a relative small amount of data. Hopefully, this article gave you an idea of the power of GANs, and the way in which they work. The results of GAN research so far have been impressive, and because the technique is relatively new, great things are definitely still to come (the Youtube channel Two Minute Papers often has interesting examples of new GAN research). Their ability to create realistic fake data means that we are getting to the point that machine learning systems can actually be said to be creative and their applications are manifold. It is exciting to see what this will lead to. Maybe in the future will have museums entirely filled with artwork by AI artists?

Read the full article: Can Artificial Intelligence be creative