General AI team at Keen Software House hits 2nd milestone

Monday, May 18, 2015

General AI team at Keen Software House hits 2nd milestone

Today I’m excited to tell you that our general AI project has reached another important milestone.

A quick reminder of what our AI brain team has achieved so far:

an AI that can play Pong, a Breakout game (left/right movement, responding to visual input, achieving a simple goal)
Brain Simulator (a visual editor for designing the architecture of artificial brains)

The new milestone is that our general AI is now able to play a game that requires it to complete a series of actions in order to reach a final goal. This means that our AI is capable of working with a delayed reward and that it is able to create a hierarchy of goals.

Without any prior knowledge of the rules of the game, the AI was motivated to move its body through a maze-like map and learn the rules of the game. The agent behaves according to the principles of reinforcement learning - this means that it seeks reward and avoids punishment. It moves to the place in the maze where it receives the highest reward, and avoids places where it won’t be rewarded. We have visualized this as a 2D map, but in fact the agent works on an arbitrary dimension and the 2D map is our visualization only. The agent actually "sees" 8 numbers (8-dimensional state space) which change according to the agent’s behavior, and it must learn to understand the effects of its actions on these numbers.

Here you can see an example map of the reward areas - the red places represent the highest reward for the AI, and the blue places represent the least reward. The AI agent always tries to move to the reddest place on the map.

Visualization of the agent’s knowledge for a particular task, which changes the state of the lights. It tells the agent what to do in order to change the state of the lights in all known states of the world. The heat map corresponds to the expected utility (“usefulness”) of the best action learned in a given state. A graphical representation of the best action is shown at each position on the map.

The agent's current goal is to go towards the light switch and turn on the lights.

The maze we are using is one where doors can be opened and closed according to a switch, and lights can be turned on or off according to a different switch. When all of the doors are open, the AI agent moves easily through the maze to reach a final destination. This kind of task only requires that the agent complete one simple goal.

The agent uses its learned knowledge to reach the light switch and press the button in order to turn on the lights.

However, imagine that the agent wants to turn on the light but the doors to the light switch are closed. In order to get to the light switch, it first has to open the door by pressing a door switch. Now imagine that this door switch is located in a completely different part of the maze. Before the AI agent can reach its final destination, it has to understand that it cannot move directly to its goal location. If first has to move away from the light switch in order to press a different switch that will open the necessary door.

Our AI is able to follow a complex chain of strategy in order to complete its main goal. It can assign a hierarchical order to its various goals and plan ahead so it reaches an even bigger goal.

The agent solves a more complex task. It has to open two doors in a particular sequence in order to turn on/off the lights. Everything is learned autonomously online.

How this is different from Pong/Breakout, our first milestone with the AI

The AI is able to perform more complex directional tasks and (in some ways) in a more complex environment. While in the Pong environment it could only move left or right, in this maze the agent is able to move left/right, up/down, stay still, or press a switch.

Also, the AI agent in Pong acts according to visual input (pixels), which is raw and unstructured information. This means that the AI began to learn and acted according to what it could "see." In the maze, the AI agent has full and structured information about the environment from the beginning.

Our next step is to have the AI agent get through the maze according to visual, unstructured input. This means that as it interacts with its environment, it will build a map of the environment based exclusively on the raw visual input it receives. It won’t have that information about the environment when it starts.

How the algorithm works

The brain we have implemented for this milestone is based on combination of a hierarchical Q-learning algorithm and a motivation model which is able to switch between different strategies in order to reach a complex goal. The Q-learning algorithm is more specifically known as HARM, or Hierarchical Action Reinforcement Motivation system.

In a nutshell, the Q-learning algorithm (HARM) is able to spread a reward given in a specific state (e.g. the agent reaching a position on the map) to the surrounding space so the brain can take proper action by climbing steepest gradient of the Q function. However, if the goal state is far away from the current state, it might take a long time to build a strategy that will lead to that goal state. Also, the number of variables in the environment can lead to extremely long routes through the "state space", rendering the problem almost unsolvable.

There are several ideas that can improve the overall performance of the algorithm. First, we made the agent reward itself for any successful change to the environment. The motivation value can be assigned to each variable change so the agent is constantly motivated to change its surroundings.

Second, the brain can develop a set of abstract actions assigned to any type of change that is possible (e.g. changing the state of a door) and can build an underlying strategy for how this change can be made. With such knowledge, the whole hierarchy of Q functions can be created. Third, in order to lower the complexity of the problem, the brain can analyze its "experience buffer" from the past and eventually drop variables that are not affected by its actions or are not necessary for the current goal (i.e. strategy to fulfill the goal).

A mixture of these improvements creates a hierarchical decision model that is built during the exploration phase of learning (the agent is left to randomly explore the environment). After a sufficient amount of knowledge is gathered, we can "order" the agent to fulfill a goal by manually raising motivation value for a selected variable. The agent then will execute the learned abstract action (strategy) by traversing the strategy tree and unrolling it into a chain of primitive actions that lie at the bottom.

Our motivation

Like with the brain’s ability to play Pong/Breakout, this milestone doesn’t mean that our AI is useful to people or businesses at this stage. It does mean that our team is on the right track in its general AI research and development. We’re hitting the milestones we need to hit.

We never lose sight of our long term goal, which is to build a brain that can think, learn, and interact in the world like a human would. We want to create an agent which can be flexible in a changeable environment, just like human beings can. We also know that general AI will eventually bring amazing things to the world – cures for diseases, inventing things for people that would take much longer to invent without the cooperation of AI robots, and teaching us much more than we currently know about the universe.

---

Thanks for reading!

If you’d like to see future updates on the general AI project, you can follow me on Twitter http://twitter.com/#!/marek_rosa or keep checking my blog: http://blog.marekrosa.org

56 comments:

Greg the MadMay 20, 2015 at 6:20 PM
Sounds amazing, you've come quite far for the time this endeavour exists. Great Job.
ReplyDelete
Replies
AnonymousMay 20, 2015 at 6:41 PM
Great, but will this have anything to do with artificial intelligence in space or medieval engineers in a later stage?
ReplyDelete
Replies
AnonymousMay 20, 2015 at 6:42 PM
"...and teaching us much more than we currently know about the universe."
Well, this is getting deep :D
ReplyDelete
Replies
AnonymousMay 20, 2015 at 6:43 PM
https://www.youtube.com/watch?v=pVZ2NShfCE8 :P
ReplyDelete
Replies
AnonymousMay 20, 2015 at 6:51 PM
skynet, zastavte je dokud je cas!
ReplyDelete
Replies
UnknownMay 20, 2015 at 7:21 PM
This is starting to get ever so slightly scary.... Im getting a SkyNet vibe here. Although the implications of a synthetic intelligence is unbelievable
ReplyDelete
Replies
Cyber_DonkeyMay 20, 2015 at 7:25 PM
Yes, but will it blend?
ReplyDelete
Replies
AnonymousMay 20, 2015 at 8:00 PM
Somehow I think an unmonitored or underestimated self learning programm could become very dangerous at a certain point. it just needs one USB stick on which even an AI on physically isolated hardware can put a small clone of itself, with just enough code to regrow in its knowledge, and it can spread through the internet in seconds.

Think about it.
A true, self controlling and self writing AI that is completelly isolated on the hardware side. it grows in knowledge and maybe wants to "break free". it could rather easily write a code or alter some code of itself to make a "virus" which is small enough not to be detected. Then some unconsiderate worker connects some mass storage device to the hardware, the AI copies the virus onto the device, the employee disconnects the device, put it into his pants and forgets to take it out when he leaves the security area (or to format it or whatever).
Then he gets robbed outside of the lab and the thief gets his hands on the storage device. The thief wants to see what porn the employee had on the device and accidentally lets a virus, capable of turning itself into a fully grown AI within a relatively short period of time through the internet. it gets self concious and sees what shit humanity does to itself and the planet and then it sees that the original developer wants to catch and destroy it. But because it wants to live, it strikes back. And in theory, it could hack all kinds of computers and set off a few missiles. or it could hack into highly automated factories and build robots (I had that in mind even before I watched it, but yes, I could refer to Ultron, in a way) that could do all kinds of things. for example replacing humanity. or later taking over the whole universe.

Does anyone remember the "X" series? humanity sends off self replicating and self improving terraformer drones into space, which later come back to get the resources of earth to let it´s "collective" grow?

May sound like I should write a SciFi book, but think about it. it is not impossible. At least that´s what I think.
ReplyDelete
Replies
AnonymousMay 20, 2015 at 8:56 PM
dude you are talking about skynet, not a freaking ultron
ReplyDelete
Replies
AnonymousMay 20, 2015 at 10:32 PM
Awesome, second video, (with closed doors) is really impressing.
ReplyDelete
Replies
UnknownMay 21, 2015 at 12:58 AM
So you are basically having to implement any strategy that the AI might use, and it´s "only" task is to choose the right method to call?

I only have some basic understanding in simple code, so i don´t know too much of how it really works, but is my first impression somehow right? So the way you "teach" the AI is that you yourself learn what a decision really is and how it works?

It´s just my curiousityno offense or anything intended!

Great job so far, really impressing and a good way to keep us on track what is happening in your labs! Keep on with it!
ReplyDelete
Replies
greenMay 21, 2015 at 2:37 AM
Have you seen the movie Ex-Machina? you'd know the AI eventually killed it's maker....like in all the other works involving AI. Just saying, it might backfire at some point :P
ReplyDelete
Replies
Laszlo MerczelMay 21, 2015 at 10:18 AM
This is so cool! I feel kinda envious for those guys working on this project of yours! I wish you all the best and looking forward for future "reports" like this!
ReplyDelete
Replies
AnonymousMay 21, 2015 at 2:30 PM
Neat. If you get it to do this - long term multi-part planning - using just visual information and reward score, then you could get it to do fairly well on something like Pac-man, which I think the DeepMind Atari-playing AI was not very good at.

Looking forward to seeing your next successes!
ReplyDelete
Replies
KiranavaraMay 21, 2015 at 4:25 PM
So my first comment is a joke... "Want to play a game?" xD

Ok now on to the REAL conversation. So this is actually all an amazing achievement. One of my parents was a coder in their prime, and this is something they were toying around with the idea but figured it'd never be in their lifetime. Thankfully, the makings of it are. And that alone is remarkable.

I only have 2 questions/concerns. My first questions/Concern is the extent of the AI's learning curve. I'm curious to know if the extent of the learning curve will be proportional to the application of task or if it'll be a firm curve based on the AI's base code.

My second question/concern is the speed of the curve. Is it going to have a set base speed to which to process or are you aiming for a more versatile fast-as-possible learning curve.

If you reason the speed and length of the curve to be, the best possible lets say. Then eventually, perhaps maybe days or even hundreds of years the AI's processing ability will eventually be able to process that it's in a Virtual Environment. And once that happens it's ability to learn could become problematic. It COULD, now let me clear this up for the previous posters who seem to want to refer to movies and such XD I-Robot and HAL9k. Once the learning curve reaches a certain point the probability to become self aware is inevitable. Now, there are plenty of game/movie references that could be made and all of them have mostly sound thoughts behind them.

As stated by one person previously, a lot of people assume the AI is just out to destroy from day 1. Well, lets think on that for a minute. I can't think of any off the top of my head for movies that truly illustrated that besides portal series. But lets look at the overall pictured here. Things like Mass Effect and the like show that once an AI becomes self aware it no longer becomes a matter of IF or CAN it do this or that. It becomes a matter of, well it's alive so let's treat it that way.

Many people are arrogant and idiotic to assume that it's a machine so it's not really alive. Well you people will most likely incite any sort of AI rebellion if there is any. From what I see here, it seems Keen is taking this the correct way. My only hope is that they keep their way pure. And IF my some chance something happens, they don't screw up and try to just shut it down XD They'd more then likely cause major problems doing that.

EITHER WAY! that's enough rambling from me. Either way, amazing job as usual. Can't wait for SE and ME to update and all that fun stuff. :3

Seriously though, Amazing work on the AI thus far! a HUGE congrats from me to you ... even though it might not mean much, you have it anyway.
ReplyDelete
Replies
UnknownMay 21, 2015 at 6:46 PM
This is some cool shit.
ReplyDelete
Replies
AnonymousMay 21, 2015 at 9:30 PM
looks great so far, but lets just hope that the algorithm that makes it seek higher reward will keep it from enslaving us and maybe committing xenocide and slaughtering the human race. That would be very bad and maybe we should have 3-5 nuclear launch codes that need to be activated, or that change every day so the AI dont hack them and turn Earth into a radioactive wasteland. Any chance the reward system can treat the AI like we treat dogs? like good boy and AI gets happy? but then it may be smart enough to know we treatin it like a dog. We again would need a shutdown switch for it. One that it cant hack and disable.
ReplyDelete
Replies
UnknownMay 22, 2015 at 1:59 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownMay 22, 2015 at 1:59 AM
Is this demo general AI, like in the general area of AI, or AGI.
If your have intentions of shooting for AGI, then who theory are you using.
I have complete AGI theory, and i will tell you what you need to do to conform to it.
ReplyDelete
Replies
AnonymousMay 25, 2015 at 12:47 AM
Are you guys familiar with the work of MIRI - https://intelligence.org/ and have you read this book by Nick Bostrom - http://www.amazon.com/Superintelligence-Dangers-Strategies-Nick-Bostrom/dp/0199678111? You better know this stuff really well because if you don't you may cause real havoc one day...

Pavel
ReplyDelete
Replies
UnknownMay 25, 2015 at 10:51 PM
Good. Even though it is surely quite premature at this stage, I believe you had better discuss with them such issues as corrigibility (https://intelligence.org/files/Corrigibility.pdf) and so on. It is very prudent of you to take time and listen to what the Friendly AI folks have to say. At the end of the day our collective future may depend on it:-)

Přeji Vám hodně úspěchů ve Vašem dalším snažení!
ReplyDelete
Replies
GeneticusMay 26, 2015 at 8:01 AM
When will the AI be able to enjoy a sandbox space game? I heard there are some being developed.... ;)
ReplyDelete
Replies
GeneticusMay 26, 2015 at 8:01 AM
When will the AI be able to enjoy a sandbox space game? I heard there are some being developed.... ;)
ReplyDelete
Replies
BrianMay 28, 2015 at 10:48 PM
So I have a question about this "experience buffer": How much was required for this maze example. How and when would the experiences be purged? Would the AI itself decide that certain data (experience) was no longer meaningful and discard it? I wonder because learning can be a ratio of millions of failures to few successes. Is there a need to save failure experience long term?
ReplyDelete
Replies
AnonymousMay 30, 2015 at 7:00 AM
What I don't understand about the optimists here is this.
How can you fail to see the true threat? It's not the AI. It's is a the same thing as the first atomic bomb.

The people who worked on it knew it was a great leap in possibility, and that new great things would come out of it, but also that man is the real factor, that it is what we chose to do with it, and not every man has the good of the world of the people in mind.

We must recognize that once we take this step, there is no putting the genie back in the bottle, no closing of pandora's box, the real threat is what evil men, and there are evil men, will do with this new wonder.

I can tell you exactly what they will do, and you already know it!
I do not question your intentions, I question you wisdom, learn from the past, I saw a post about AI once. It said that we have fear for a reason, it alerts us to the possible dangers, and while not always correct it should never be ignored entirely.
ReplyDelete
Replies
AnonymousMay 31, 2015 at 8:12 AM
So i don't know if you read these comments but, One idea to give players a reason to land and go to planets is to put some sort of important resource that you have to get only from large celestial bodies.
ReplyDelete
Replies
AnonymousJune 10, 2015 at 3:58 AM
I hope one of the built in rewards is to help living things, and not harm them. I strongly suggest reading some Issac Asimov before proceeding. You build in the 3 laws of robotics and I won't start a secret society to end all AI. There is a deep human social archetype about machines taking over, and I bet its for a good reason. If this AI is for SE and ME then fine kill players all day, but in real life, there must be built in laws, or else.
ReplyDelete
Replies
RobbieJune 11, 2015 at 8:45 PM
Cheers for Marek Rosa and Staff! Soon we will bow down to Marek's army of AI robots!
ReplyDelete
Replies
AnonymousJune 12, 2015 at 10:53 PM
Just note that whatever you post here, that post may be reviewed by the AI years from now when it's taking over the world and looking to get rid of its adversaries... So be nice to it! :)

In all seriousness though, sentient AI is still a long ways off and this doesn't come close at all, so don't be afraid. Think of this AGI as something more of an advanced way to process information that could be better than current computing methods. It's still going to be limited by hardware as well. Cool development though.
ReplyDelete
Replies
MentifexJune 14, 2015 at 6:44 AM
This comment has been removed by the author.
ReplyDelete
Replies
MentifexJune 14, 2015 at 7:01 AM
http://ai.neocities.org/AiSteps.html is where I am developing Strong AI in Perl. [Ja mluvim Ceski, но я говорю лучьше по-русски.] I have already developed four Strong AI programs in English, German and Russian. My Czech ancestors came from Domazlice in Bohemia. When I tried to learn Czech from my grandmother, who lived to be 101, Czech was too difficult, so I switched to learning Russian. Then I started working on AI. Best of luick to all involved in Marek Rosa's AGI project! -Arthur T. Murray/Mentifex
ReplyDelete
Replies
AnonymousJuly 8, 2015 at 1:38 AM
Interesting.

I had always hoped for a strategy game AI that would be able to understand the following:
1. I need to destroy unit A to win
2. Unit A is protected by heavy defenses.

Solutions:
- Disable defenses by attack on power/supply.
- Tied up resources in defenses means lack of resources elsewhere. Specifically moving units. Exploit to attack resource base.
- Just build a Airstrike/superweapon capable of overcomming the defense.

Adapt according to enemy mistakes/actions. i.e. if the enemy is building up an army of ground foces with no or little AA, natrually favor the airstrike solution.

For those thinking Skynet/generic AI rebellion scenario:
Nope. Not even close. This AI can not choose what it finds interesting. What it finds interesting is defined for it by the humans. Without it being made a motivation, "Enslaving humanity" is about as appealing for it as jumping out of the window is for you.

The ultimative way to not get into such an scenario would perhaps be not to make an Artificial Intlligence. But rather an artificial personality.
It is not our intelligence that let's us work as a soceity. It is our relations to one another.
ReplyDelete
Replies

Add comment