In this video above, we demonstrate the learning process of one of our LLM Agents being taught how to use an API to control a drone quadcopter. The initial stages require us to provide the Agent with detailed and comprehensive instructions about how to send HTTP requests and what commands are available through the API. As the video progresses, the Agent quickly grasps these instructions and leverages the knowledge it already has to perform advanced and intricate tasks, like flying the drone following a square trajectory. This showcases the Agent's resilience and adaptive learning capabilities - how the Agent recovers from errors and false assumptions.
This version of the continual-learning agents represents a significant advancement from our first prototype (agent embodied in a Python terminal). This enhanced Agent has access to distinct forms of working memory and long-term memory, enabling it to effectively manage several types of memory inconsistencies, such as contradictions or outdated information, and can learn from user feedback and environmental cues. You can think about it like a cognitive architecture built on top of LLM.
The agent's response is a result of a sequence of nested steps. This method augments the LLM's cognitive resources and attention span to extend beyond the limits of LLM (LLMs are stateless, have fixed size context, don't pay sufficient attention to all instructions in the prompt, etc.). Notably, the process employs iterative prompting, using multiple prompts to accomplish tasks that the LLM can't perform in a single inference—such as retrieving and summarizing memories or maintaining the agent's state for future steps. Every step in this sequence receives input data, processes it to produce relevant output data, and shares this output across the entire chain. Consequently, formulating a response becomes a joint effort. Each step in the chain has the autonomy to determine the information it requires, what new state needs to be stored in working memory and long-term memory, if it needs to consult the LLM, and what specific prompt should be fed to the LLM for each iteration.
This footage demonstrates how we teach the agent to comb an area inside four GPS points defined by the user. First, we let the agent memorize the four GPS points by manually sending the drone to each position and asking the agent to remember them with different names. Then, we explain to the agent what we expect it to do: thoroughly fly over that area by following zig-zagging corridors of 10 meters in width. This is extremely useful for planning exhaustive search operations.
The video has been sped up 4x during the teaching and 16x during the combing phases.
In this recording, we teach the Agent to fly the drone in a circle. Because the Agent still ignores some aspects of the drone API, we must be specific about certain things, like using the blocking commands instead of the asynchronous ones. The Agent flies the drone following the user specification: the circle has a 50-meter radius, and its center is the current drone position.
We have sped up the video 4x for visualization purposes.
Here's a video showcasing speech-to-text as input and text-to-speech as output. The agent has already been given a list of functions that it can use to interact with the drone API.
This addition enhances the user interaction and adds a social dimension to the agent. It makes the interaction more engaging.
Note this is still an early prototype, and many improvements are in the works.
Link to the YouTube playlist containing all videos.
Thank you for reading this blog!
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI
Marek Rosa is the founder and CEO of GoodAI, a general artificial intelligence R&D company, and Keen Software House, an independent game development studio, started in 2010, and best known for its best-seller Space Engineers (5 million copies sold). Space Engineers has the 4th largest Workshop on Steam with over 500K mods, ships, stations, worlds, and more!
Marek has been interested in game development and artificial intelligence since childhood. He started his career as a programmer and later transitioned to a leadership role. After the success of Keen Software House titles, Marek was able to fund GoodAI in 2014 with a $10 Million personal investment.
Both companies now have over 100 engineers, researchers, artists, and game developers.
Marek's primary focus includes Space Engineers, the VRAGE3 engine, the AI Game, and LLM agents that learn continually.
GoodAI's mission is to develop AGI - as fast as possible - to help humanity and understand the universe. One of the commercial stepping stones is the "AI game," which features LLM-driven NPCs grounded in the game world with developing personalities and long-term memory. GoodAI also works on autonomous agents that can self-improve and solve any task that a human can.