Friday, September 1, 2023

LLM Agent taught to control drones

 

In this video above, we demonstrate the learning process of one of our LLM Agents being taught how to use an API to control a drone quadcopter. The initial stages require us to provide the Agent with detailed and comprehensive instructions about how to send HTTP requests and what commands are available through the API. As the video progresses, the Agent quickly grasps these instructions and leverages the knowledge it already has to perform advanced and intricate tasks, like flying the drone following a square trajectory. This showcases the Agent's resilience and adaptive learning capabilities - how the Agent recovers from errors and false assumptions. 


This version of the continual-learning agents represents a significant advancement from our first prototype (agent embodied in a Python terminal). This enhanced Agent has access to distinct forms of working memory and long-term memory, enabling it to effectively manage several types of memory inconsistencies, such as contradictions or outdated information, and can learn from user feedback and environmental cues. You can think about it like a cognitive architecture built on top of LLM.


The agent's response is a result of a sequence of nested steps. This method augments the LLM's cognitive resources and attention span to extend beyond the limits of LLM (LLMs are stateless, have fixed size context, don't pay sufficient attention to all instructions in the prompt, etc.). Notably, the process employs iterative prompting, using multiple prompts to accomplish tasks that the LLM can't perform in a single inference—such as retrieving and summarizing memories or maintaining the agent's state for future steps. Every step in this sequence receives input data, processes it to produce relevant output data, and shares this output across the entire chain. Consequently, formulating a response becomes a joint effort. Each step in the chain has the autonomy to determine the information it requires, what new state needs to be stored in working memory and long-term memory, if it needs to consult the LLM, and what specific prompt should be fed to the LLM for each iteration.



Initially, the agent is in a “blank state,” knowing nothing about the drone API. Therefore, we commence by closely guiding its actions, telling the agent the exact URL that it has to communicate with and the exact form and value of the HTTP requests it has to send. However, as time progresses, the Agent gradually accumulates knowledge and expertise, enabling it to tackle increasingly complex tasks based on its past experience. 

We actively incentivize the agent to create functions or any kind of reusable code for representing learned skills, which the agent ends up carrying out without us having to tell it to do so. In addition, the agent interacts with a Python terminal as its means for performing actions, so it will instantly become aware of any malfunction or error in its code, being able to fix it on the fly and learning from its mistakes. Finally, thanks to its episodic memory, the agent can relate current situations with past ones, remembering errors made during its instruction or functions that it wrote long ago and might leverage just now.

The video demonstrates that the terminal prototype ended up being converted into one stage of the pipeline (the TerminalAction), which leverages the Python terminal environment plus the memories retrieved by a previous stage to carry out the agent’s actions. On top of this, other stages add functionality like terminal session persistence and generation of new memories.

Take a look at more videos below. 



This footage demonstrates how we teach the agent to comb an area inside four GPS points defined by the user. First, we let the agent memorize the four GPS points by manually sending the drone to each position and asking the agent to remember them with different names. Then, we explain to the agent what we expect it to do: thoroughly fly over that area by following zig-zagging corridors of 10 meters in width. This is extremely useful for planning exhaustive search operations

The video has been sped up 4x during the teaching and 16x during the combing phases.


In this recording, we teach the Agent to fly the drone in a circle. Because the Agent still ignores some aspects of the drone API, we must be specific about certain things, like using the blocking commands instead of the asynchronous ones. The Agent flies the drone following the user specification: the circle has a 50-meter radius, and its center is the current drone position. 

We have sped up the video 4x for visualization purposes.




Here's a video showcasing speech-to-text as input and text-to-speech as output. The agent has already been given a list of functions that it can use to interact with the drone API.

This addition enhances the user interaction and adds a social dimension to the agent. It makes the interaction more engaging.

 Note this is still an early prototype, and many improvements are in the works.


Link to the YouTube playlist containing all videos.


Thank you for reading this blog!

 Best,

Marek Rosa
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI

 

For more news:
Space Engineers: www.SpaceEngineersGame.com
Keen Software House: www.keenswh.com
VRAGE Engine: www.keenswh.com/vrage/
GoodAI: www.GoodAI.com

 

Personal bio:

Marek Rosa is the founder and CEO of GoodAI, a general artificial intelligence R&D company, and Keen Software House, an independent game development studio, started in 2010, and best known for its best-seller Space Engineers (5 million copies sold). Space Engineers has the 4th largest Workshop on Steam with over 500K mods, ships, stations, worlds, and more!

Marek has been interested in game development and artificial intelligence since childhood. He started his career as a programmer and later transitioned to a leadership role. After the success of Keen Software House titles, Marek was able to fund GoodAI in 2014 with a $10 Million personal investment.

Both companies now have over 100 engineers, researchers, artists, and game developers.

Marek's primary focus includes Space Engineers, the VRAGE3 engine, the AI Game, and LLM agents that learn continually.

GoodAI's mission is to develop AGI - as fast as possible - to help humanity and understand the universe. One of the commercial stepping stones is the "AI game," which features LLM-driven NPCs grounded in the game world with developing personalities and long-term memory. GoodAI also works on autonomous agents that can self-improve and solve any task that a human can.


2 comments:

  1. Je tu absolutne nepochopenie a tym padom rozpor v tom, ako sa ma GAI vzdelavat. Je uplna somarina, aby sa GAI vyvyjala ako dieta, novorodenec. To nedava nikomu zmysel, pretoze sa musi ucit elementy. Pokial sa akakolvek spolocnost (aka firma) bude pokusat zacat vyvijat GAI tymto sposobom, bude odsudena na neuspech.

    ReplyDelete
  2. We are not saying it has to learn like a newborn. It doesn’t have to because LLMs already know much more than any baby or even adult. We were demonstrating how agent can learn new skills and knowledge on top of what is already in LLM.

    ReplyDelete