Friday, March 22, 2024

LTM Benchmark: Improvements and new reports

At GoodAI, we are committed to developing agents that are capable of continual and life-long learning. As part of our efforts, we have previously open-sourced the GoodAI LTM Benchmark, a suite of tests aimed to evaluate the Long-Term Memory (LTM) abilities of any conversational agent. In this benchmark, all tasks take place as part of one single very long conversation between the agent and our virtual tester. The benchmark interleaves information and probing questions from different tasks, albeit taking special care of weaving them together into a natural conversation.

LTM = Long-Term Memory

As a direct consequence of our research in agents with LTM, the GoodAI LTM Benchmark is in constant evolution. To us it represents an invaluable tool for evaluating our agents and validating our hypotheses. Additionally, it helps us characterize the ways in which the distinct agents fail and therefore it provides us goals to aim for. In the GoodAI LTM team we regard the GoodAI LTM Benchmark as a moving goal post, and by introducing new tasks and features we are continuously pushing that goal post away, because what is a goal post worth if it is easy to reach?


New features

With every new feature, we try to make the GoodAI LTM Benchmark not only more and more challenging, but also more realistic. The thing about benchmarking LTM is that you need your tests to be long, very long. So you either introduce a ton of dummy interactions for the sole sake of filling up the conversation, and accept that all those tokens are wasted resources, or you start interleaving the tasks and weave them into a seamless and natural conversation (like we do). We are always doing our best to minimize the amount of wasted tokens, whilst keeping the conversation natural and making sure that the agent can follow along.

For more details, continue to GoodAI Blog Post


Thank you for reading this blog!

 

Best,
Marek Rosa
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI

 

For more news:
GoodAI Discord: https://discord.gg/Pfzs7WWJwf
Space Engineers: www.SpaceEngineersGame.com
Keen Software House: www.keenswh.com
VRAGE Engine: www.keenswh.com/vrage/
GoodAI: www.GoodAI.com
Personal Blog: blog.marekrosa.org

 

Personal bio:

Marek Rosa is the founder and CEO of GoodAI, a general artificial intelligence R&D company, and Keen Software House, an independent game development studio, started in 2010, and best known for its best-seller Space Engineers (over 5 million copies sold). Space Engineers has the 4th largest Workshop on Steam with over 500K mods, ships, stations, worlds, and more!

Marek has been interested in game development and artificial intelligence since childhood. He started his career as a programmer and later transitioned to a leadership role. After the success of Keen Software House titles, Marek was able to fund GoodAI in 2014 with a $10 Million personal investment.

Both companies now have over 100 engineers, researchers, artists, and game developers.

Marek's primary focus includes Space Engineers, the VRAGE3 engine, the AI People game, long-term memory systems (LTM), an LLM-powered personal assistant with LTM named Charlie Mnemonic, and the Groundstation.

GoodAI's mission is to develop AGI - as fast as possible - to help humanity and understand the universe. One of the commercial stepping stones is the "AI People" game, which features LLM-driven AI NPCs. These NPCs are grounded in the game world, interacting dynamically with the game environment and with other NPCs, and they possess long-term memory and developing personalities. GoodAI also works on autonomous agents that can self-improve and solve any task that a human can.

Sunday, March 10, 2024

Solo creators enhanced by a legion of AI agents

Within the next five years, every individual will have the ability to employ AI agents from the cloud. These agents will effectively serve as our AI employees and assistants, aiding in tasks where we might typically enlist the services of another person or company.

The need to hire human workers may become obsolete. AI agents will be more cost-effective, loyal, and easier to communicate with, eliminating the challenges of human factors such as ego, motivation, and salary negotiations.

This transformation signifies that anyone with a business idea, hobby project, vision, or passion can set their plans in motion without recruiting teams and navigating the intricate maze of leadership challenges.

Imagine wanting to design a game without having expertise in programming or artistry. One would simply engage an AI agent and initiate a conversation. Describe your desired game, and the agent will pose follow-up questions to flesh out unspecified aspects. It would then develop the game, handling everything from code writing to art creation. If the workload proves substantial, the agent can temporarily commission additional AI agents, efficiently dividing tasks among them. Coordination and communication would be seamless, and the final product will be presented to you within seconds or minutes.

Contrast this with today's lengthy process, where games take years to develop due to intricate discussions, hiring challenges, communication breakdowns, and varying human performance. With AI, feedback will be incorporated rapidly, and adjustments made in moments rather than weeks or months. You could play the game, provide your feedback, and the agent would quickly apply modifications, allowing you to try it again almost instantaneously.

AI agent's primary objective is to resonate with your intent and to align with your will. While it might suggest alternative perspectives or solutions based on data, it operates without emotional biases or conflicts. Think of it like the relationship between a painter and a brush. The painter has the vision, and the brush aids in bringing that vision to life. Similarly, while the AI can offer tools and options, the user drives the ultimate direction and decisions.

In personal spheres, too, AI agents will prove invaluable. Picture having a personal Alfred, much like Batman's trusted aide. These agents can source information, offer advice, interact on your behalf, and consistently prioritize your wellbeing, freedom, and privacy. 

Crucially, any data they gather will be end-to-end encrypted, ensuring confidentiality, and the agent will maximize your freedom, because it has no other master.

While traditional tools require users to adapt to them, AI agents stand apart. They proactively conform to the user, ensuring transparency and clarity in their actions and communicating in easily digestible terms.

Currently, the closest we have to these AI agents are the LLM agents (which I've covered in a prior piece for LEVEL). Their limitations include a lack of continual learning and long-term planning, frequent errors, a text-only interface, and relatively slow processing speeds. Today, LLM agents are not equipped to replace human labor.

However, in the near future, these limitations will be solved, and AI agents will replace most or all human workers.

Regarding cost, AI agents will be substantially more affordable than human employees. Consider a rough comparison: the average hourly wage in the Czech Republic is 350 CZK, while an LLM agent thinking at a speed of 1 million text tokens per hour would cost a mere 44 CZK (or $2).

Engaging directly with your AI agent eliminates the need for a middleman, streamlining your processes. Just as we currently opt to handle tasks ourselves when they're straightforward and time-efficient, why introduce another party? If you can directly query ChatGPT and get precise answers, why ask someone else to act as the intermediary?

What does this trend herald?

The need to hire human workers might vanish.

People will not have to work for other people anymore.

You will have an army of AI agents working for you, writing code, creating art, designing, testing, researching, doing marketing, etc.

AI agents will prioritize and augment your freedom, privacy, skills, and creativity.

These agents will act as extensions of ourselves, removing any perceived need to delegate tasks to fellow humans.

It will lead to an abundance of produced work, not being anymore limited by the productivity of the human population.

Humans will engage in interactions purely out of desire and joy, rather than out of job-related obligations, valuing genuine connections over forced professional exchanges.

The deployment of high-powered AI agents on a continuous basis will likely drive a significant surge in energy consumption and necessitate advancements in hardware. How can we address these environmental and infrastructural challenges?

We may be on the cusp of a new age dominated by massively productive solo creators who can truly embrace their creative freedom without being limited by other people.

An underlying question remains as we usher in this new era of AI-driven autonomy and one-person corporations. If traditional employment diminishes in favor of AI agents, from where will individuals derive their livelihoods? How will society adapt to ensure its members' well-being and financial stability? While AI's vast possibilities and benefits are undeniable, the broader socioeconomic ramifications warrant deep reflection and discussion. This, however, is a discourse for another day.


Thank you for reading this!

In some of the next blog posts, I plan to analyze the “Future of delegation”—the specifics of how it will change once we have AI agents that are maximally aligned with our intent.


Friday, March 1, 2024

Introducing Charlie Mnemonic: The First Personal Assistant with Long-Term Memory

As part of our research efforts in continual learning, we are open-sourcing Charlie Mnemonic, the first personal assistant (LLM agent) equipped with Long-Term Memory (LTM)


At first glance, Charlie might resemble existing LLM agents like ChatGPT, Claude, and Gemini. However, its distinctive feature is the implementation of LTM, enabling it to learn from every interaction. This includes storing and integrating user messages, assistant responses, and environmental feedback into LTM for future retrieval when relevant to the task at hand.

Charlie Mnemonic employs a combination of Long-Term Memory (LTM), Short-Term Memory (STM), and episodic memory to deliver context-aware responses. This ability to remember interactions over time significantly improves the coherence and personalization of conversations.

Moreover, Charlie doesn't just memorize facts such as names, birthdays, or workplaces; it also learns instructions and skills. This means it can understand nuanced requests like writing emails differently to Anna than to John, fetching specific types of information, or managing smart home devices based on your preferences.

Envision LTM as an expandable, dynamic memory that captures and retains every detail, constantly enhancing its understanding and functionality.

What is inside:

  • The LLM powering Charlie is the OpenAI GPT-4 model, with the flexibility to switch to other LLMs in the future, including local models.
  • The LTM system, developed by GoodAI, stands at the core of Charlie's advanced capabilities.

For more details, continue to GoodAI Blog Post

Github: https://github.com/GoodAI/charlie-mnemonic

Discord: https://discord.gg/Pfzs7WWJwf

Authors: Antony Alloin, Karel Hovorka, Ondrej Nahalka, Vojtech Neoral, and Marek Rosa 

Thank you for reading this blog!

 

Best,
Marek Rosa
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI

 

For more news:
Space Engineers: www.SpaceEngineersGame.com
Keen Software House: www.keenswh.com
VRAGE Engine: www.keenswh.com/vrage/
GoodAI: www.GoodAI.com
Personal Blog: blog.marekrosa.org

 

Personal bio:

Marek Rosa is the founder and CEO of GoodAI, a general artificial intelligence R&D company, and Keen Software House, an independent game development studio, started in 2010, and best known for its best-seller Space Engineers (over 5 million copies sold). Space Engineers has the 4th largest Workshop on Steam with over 500K mods, ships, stations, worlds, and more!

Marek has been interested in game development and artificial intelligence since childhood. He started his career as a programmer and later transitioned to a leadership role. After the success of Keen Software House titles, Marek was able to fund GoodAI in 2014 with a $10 Million personal investment.

Both companies now have over 100 engineers, researchers, artists, and game developers.

Marek's primary focus includes Space Engineers, the VRAGE3 engine, the AI People game, long-term memory systems (LTM), an LLM-powered personal assistant with LTM named Charlie Mnemonic, and the Groundstation.

GoodAI's mission is to develop AGI - as fast as possible - to help humanity and understand the universe. One of the commercial stepping stones is the "AI People" game, which features LLM-driven AI NPCs. These NPCs are grounded in the game world, interacting dynamically with the game environment and with other NPCs, and they possess long-term memory and developing personalities. GoodAI also works on autonomous agents that can self-improve and solve any task that a human can.