Wednesday, September 16, 2015

Another GoodAI Milestone: Attention, Vision, and Drawing

Today I am excited to share with you that our GoodAI team has reached another milestone. These research results are part of our artificial brain architecture that we call the attention module.

Your brain’s ability to focus and pay attention to its surroundings is an important part of what allows humans to survive and live comfortably. Attention enables you to selectively focus on parts of your environment. We use attention to designate how much concentration a piece of information or particular stimuli deserves, while at the same time ignoring input that we are aware of but know isn’t immediately relevant. Imagine how difficult it would be if walking down the street, we focused equally on everything we saw, heard, smelled, or could feel. Attention helps us know what’s important at any given time. In the past, attention helped us hide from dangerous predators like wolves or bears, and today it enables us to drive cars and even distinguish between a friend’s cute puppy and the neighbor’s unfriendly watchdog.

It’s important to remember that these results are just a subsection of the attention module of our artificial brain, and that we are building on findings of other researchers in the areas of recurrent networks and deep learning. While this is certainly not an extraordinary breakthrough in AI research, reaching this milestone means that we’re on track in our progress towards general artificial intelligence.

Attention Method and Results

The new milestone is that our attention module is able to understand what the things it sees look like, and how these things it sees relate to each other. One subpart of our module shows how it is able to study a simple black-and-white drawing and remember it. It then reproduces a similar but completely original drawing after seeing just a small selection of the drawing.

For example, if you train the module on faces like these:

Subset of images for training faces

And then show it just some hair (which the AI has never seen before):

Partial input
The module is able to generate a new, complete image of its own creative design. The generated image is not a copy, but is an original picture created by the AI, inspired by images it saw previously.

Generated image
You can see the whole process in this video with several examples:

And we went even further in our experimentation. Inspired by DecoBrush, we trained our module to remember ornamental letters. When we draw a normal, handwritten letter, the module adds ornamentation to it. Please note that this is just the first try and several additional improvements, including structure or neighborhood constraints (i.e. convolutions), could be applied to improve the result.
Adding ornamentation to characters. Several characters are perfect (when the module learned what to copy), while others make sense and are novel, i.e. "E" in the fourth row (where the module learn how to combine)

We’d also like to share our work in progress on more complicated dataset. The module was fed with pictures of mountains. Below you can see both inputs (top) and generated images (bottom):

The figure shows picture reconstruction for 12 examples (divided by a line). The top rows show the input. Below the input there are generated images (the network learns how to reconstruct the input image over a series of steps)

The module tries to generalize among the pictures – interestingly, it also added extra mountaineers in the top left picture pair. We are already working on further improvements and even more interesting tests and applications.

A scene for testing attention
Finally, the module can also understand the relationship between objects in a picture. This is illustrated in our next example: a room with objects. The goal here is to teach the attention module to remember the relationship between objects in the room and also remember how the individual objects look. Simply put, we want our algorithm to realize that there are chairs, a plant, that the plant is below the chair, and that the chairs are next to each other.

This video describes the attention module in action and shows the results:

Module details (for advanced technical readers)

For this milestone, we were inspired mainly by two works: Shape Boltzmann Machines that use pixel-neighbor information to complete missing parts of a picture, and DeepMind's DRAW paper that introduces a general recurrent architecture for unsupervised attention. Similar to that system, our model contains two Long Short-Term Memory (LSTM) layers (encoder and decoder) with about 60 cells each (so that it won’t overfit data but tries to generalize across it).

Input images are separated into: (i) a training set that is used for training (such as the images of faces), and (ii) a test set containing images the module didn’t see during the training (for example, just the hair).

To generate a face, the algorithm uses a database of images to learn the characteristics of a face. When the module sees a new image or part of a new image, it generates yet another new image. This newly generated image is then fed as the input in the next time step. Over time, the module improves by generating novel images that make sense to a viewer.

The attention architecture has similar structure to the algorithm for generating images, but differs in the following way: the input is always just one part of the image that it sees (in contrast to full image as before). From that limited input, the attention module generates: (i) the direction where it should look next and (ii) what will be there. This output is fed back as new input in the following step. The model learns to minimize the difference between the overall scene and what it predicted about the scene.
The attention module depends on the ability to learn sequences of information. Here, we give a bit of insight into how we compared two training algorithms using a simple yet challenging example. We tested the Long Short-Term Memory (LSTM) model with Real-Time Recurrent Learning (RTRL) versus Backpropagation Through Time (BPTT). At time 1, we feed the LSTM a randomly cropped part of the picture of a handwritten digit from the MNIST database; at time 2, a random part of the following number is used as the input; and at time 3, it finally sees what it should predict (the number that follows after those two). Recurrent connections are highlighted in blue. While we found that RTRL is great in most cases, it was outperformed by BPTT in this example as unfolding the network seems to be very important here.


This milestone is an important step along the way to developing a truly general artificial intelligence. However, this is still just a small part of the attention module of our artificial brain, and is an even smaller part of building an artificial brain as a whole. Moreover, we are already exploring several improvements of the attention module and you can expect to see these results in an upcoming Brain Simulator update.

In the meantime, we are working hard on other modules of our AI brain - including motor commands, long term memory, and architectures inspired by neuroscience and deep learning, and you can look forward to new milestone news in the near future.

Thanks for reading! And for a special goodbye, here’s how our attention module imagines the Space Engineers astronaut:

And many thanks to Martin, Simon, and Honza, our attention module researchers!

Marek Rosa
CEO, CTO and founder at GoodAI

Note on blog comments:

Lately I’ve received a lot of spam comments on my blog, and unfortunately the blog has no settings for preventing spammers. Due to the extreme amount of spam messages, I am forced to start moderating comments – I hope readers will understand this need, and we will do our best to approve comments as fast as possible.


Want to know as soon as GoodAI reaches another milestone? Check us out on social media: