Monday, February 12, 2024

Introducing GoodAI LTM Benchmark

As part of our research efforts in the area of continual learning, we are open-sourcing a benchmark for testing agents’ ability to perform tasks involving the advanced use of the memory over very long conversations. Among others, we evaluate the agent’s performance on tasks that require dynamic upkeep of memories or integration of information over long periods of time.

We are open-sourcing:

We show that the availability of information is a necessary, but not sufficient condition for solving these tasks. In our initial benchmark, our conversational LTM agents with 8k context are comparable to long context GPT-4-1106 with 128k tokens. In a larger benchmark with 10 times higher memory requirements, our conversational LTM agents with 8k context achieve performance which is 13% better than GPT-4-turbo with a context size of 128,000 tokens for less than 16% of the cost.

We believe that our results help illustrate the usefulness of the LTM as a tool, which not only extends the context window of LLMs, but also makes it dynamic and helps the LLM reason about its past knowledge and therefore better integrate the information in its conversation history. We expect that LTM will ultimately allow agents to learn better and make them capable of life-long learning.

Motivation

At GoodAI, we are developing LLM agents that can learn continually from the interactions with the user and the environment. Our goal is to create agents that are capable of life-long learning, which means that they are constantly gathering knowledge from every new experience and leveraging all past knowledge to act and learn better in the future. In the past we have organized the GoodAI Challenge, specifically the Gradual Learning round in 2017, to stimulate ideas on continual learning. 

While pursuing this goal, we quickly realized that we needed a way to objectively measure our progress on LLM agents’ ability to learn continually. Very often we found ourselves trying different solutions to the same problem and not knowing which one to choose. The methods were usually different, but the results felt equivalent or not significantly different. In addition to this, most existing benchmarks fell short for our purposes because of a strong focus on testing LLM-specific capabilities, like mathematical reasoning, instruction-following abilities, or being centered around testing specific methods or tools; such as vector databases, prompting, information placement within the context, or performance in question-answering tasks based on static memories or factual knowledge.

In short, most benchmarks focused either on aspects that were LLM-, method- or implementation-specific, and we wanted to have something that we wouldn’t need to throw away and rewrite from scratch in the future. On the contrary, we needed a frame of reference that was capable of standing the test of time and that would evolve as we discovered new caveats in our own agents and translated them into new goals to achieve. A stable benchmark for a constantly-changing agent: an incremental, continual, and conversational benchmark.

For these reasons, we developed the GoodAI LTM Benchmark, a framework that can test conversational agents’ abilities to learn and adapt in realistic scenarios and over long periods of time.

For more details, continue to GoodAI Blog Post

Github: https://github.com/GoodAI/goodai-ltm-benchmark

Discord: https://discord.gg/Pfzs7WWJwf

Authors: David Castillo, Joseph Davidson, Finlay Gray, José Solorzano, and Marek Rosa  

Thank you for reading this blog!

 

Best,
Marek Rosa
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI

 

For more news:
Space Engineers: www.SpaceEngineersGame.com
Keen Software House: www.keenswh.com
VRAGE Engine: www.keenswh.com/vrage/
GoodAI: www.GoodAI.com
Personal Blog: blog.marekrosa.org

 

Personal bio:

Marek Rosa is the founder and CEO of GoodAI, a general artificial intelligence R&D company, and Keen Software House, an independent game development studio, started in 2010, and best known for its best-seller Space Engineers (over 5 million copies sold). Space Engineers has the 4th largest Workshop on Steam with over 500K mods, ships, stations, worlds, and more!

Marek has been interested in game development and artificial intelligence since childhood. He started his career as a programmer and later transitioned to a leadership role. After the success of Keen Software House titles, Marek was able to fund GoodAI in 2014 with a $10 Million personal investment.

Both companies now have over 100 engineers, researchers, artists, and game developers.

Marek's primary focus includes Space Engineers, the VRAGE3 engine, the AI People game, long-term memory systems (LTM), an LLM-powered personal assistant with LTM named Charlie Mnemonic, and the Groundstation.

GoodAI's mission is to develop AGI - as fast as possible - to help humanity and understand the universe. One of the commercial stepping stones is the "AI People" game, which features LLM-driven AI NPCs. These NPCs are grounded in the game world, interacting dynamically with the game environment and with other NPCs, and they possess long-term memory and developing personalities. GoodAI also works on autonomous agents that can self-improve and solve any task that a human can.

Sunday, January 28, 2024

My review of 2023 & Plans and predictions for 2024

   SUMMARY:

  • 10-year anniversary of Space Engineers!
  • Space Engineers on PlayStation
  • Space Engineers - four major updates released
  • VRAGE3 development
  • LTM Benchmark
  • Charlie Mnemonic - personal assistant with long-term memory
  • Drone Groundstation
  • AI People game - AI NPCs with long-term memory
  • About our AGI development
  • Plans for 2024
  • My global predictions for 2024


As each new year dawns, I take time to reflect on the accomplishments of the past year and set my sights on goals for the upcoming one. This year, for the first time, I'm excited to include my global predictions for 2024. For those interested in revisiting last year's review and plans (2022/23), you can find them here.

Now, let's dive into the review for this year. I hope you enjoy reading it!


Review of 2023

Major achievements from last year:

Keen Software House

We continued to prioritize player choice by releasing Space Engineers updates and new content simultaneously across all platforms (PC + Xbox and newly also PlayStation). This allowed our community to fully enjoy the game on the platform of their preference while also fostering a unified and engaged player base.

We continued developing VRAGE3 - the next generation of our in-house game engine.

For this reason, we grew our team significantly, and we are still hiring worldwide (programmers, artists, and other roles)

We released four major updates in 2023:

Space Engineers on PlayStation


We have released Space Engineers on PlayStation 4 and 5, so more players can unleash their creativity and need to create.

We are extremely excited about the response so far as PlayStation players got a chance to become Space Engineers. With full crossplay, Xbox, PC, and now PlayStation players can team up and explore Space Engineers on their platform of choice!

Automatons Update


This update not only reshaped automation, but served as a basis for continued exploration of the “NPC” concept.

Automatons DLC is our highest-selling DLC.

Warfare Evolution & Decorative Pack #3


Our first update that was released simultaneously on 3 different platforms. Among the highlights of this free update was the experimental PvP Scenario: Space Standoff, setting the stage for epic confrontations that will test your strategic prowess and combat skills.

10 Years of Space Engineers


Ten years ago, we began this incredible journey through the stars, laying the foundation for everything that has come to pass. 10 years, 5 million copies, hundreds of updates, and a half million workshop mods and creations later, Space Engineers is still thriving! Some people consider it to be a “forever game” and I agree. 

To every Space Engineer out in the vastness of space, those of you that have been with us from the beginning, to those that became a Space Engineers only now, thank you for being a part of this journey. Your creativity and passion for Space Engineers inspires us to our very best.

In celebration of this incredible moment, we have made a surprise update to commemorate 10 years of engineering, planning, grinding, welding, battles, and exploration, and Space Engineers 10-year anniversary party which we held in our HQ, in Oranzerie.

Space Engineers + Hardspace: Shipbreaker Bundle


We had amazing Hardspace: Shipbreaker Shipbuilding Competition in 2023 - you can still purchase Space Engineers + Hardspace: Shipbreaker Bundle and get two games that make the space worker fantasy a fascinating reality to play!


Microsoft Store


In early January 2024, we fully introduced Space Engineers to Microsoft Store. If you already own Space Engineers on Xbox, you received the Microsoft Store version automatically, at no extra cost. It’s our way of saying thank you for being part of the Space Engineers journey. 


We are dedicated to the continued development of Space Engineers
on PC, Xbox and PlayStation.

VRAGE3


I believe in Space Engineers and I want it to be here in the decades to come; and for this, we need a strong basis, a game engine that is up to today’s standards, unique and revolutionary, and yet expandable in the future. 

For these reasons, two years ago, we started the development of VRAGE3. The engine team currently has more than 30 programmers in the render, engine, and tools departments.

You can read more in Jan Hlousek’s (Keen’s Tech Director) blog post and follow him for more VRAGE3 updates on his account.

Watch the following presentations from SE 10-year anniversary party:


GoodAI

In 2023, the primary shift in our approach involved pivoting towards practical and tangible applications of AI, LLM-powered agents, and moving slightly away from a core focus on scientific research and other AGI architectures (Memetic Badger). 

More details bellow. 

AI People Game


It’s a game with AI-powered NPCs that can learn and interact with the environment, other NPCs, and the player.

We have been working on it since 2019. We made significant progress that we plan to share with you at the beginning of 2024.


Charlie Mnemonic - personal assistant with long-term memory 


Our objective is to evolve beyond assistants like ChatGPT by incorporating a few crucial differences. The most significant of these is the integration of long-term memory, enabling the assistant to continuously learn from you, adapt to your preferences, and align with your needs.

Ultimately, we envision this assistant becoming your primary gateway to the world, offering assistance and empowerment in all your endeavors. Central to this vision is the unwavering commitment to your privacy and the maximization of your freedom.

We introduced HALLM - An agent that acts in a Python Terminal.


Drone Groundstation 


Is our AI platform for controlling a fleet of drones that can search for people, follow them and serve other safety issues. Currently in beta-testing.

We connected Groundstation to our personal assistant, Charlie, and taught the assistant to control the drones. We believe that the future of software UX is natural language assistants, as two-way interfaces. See more here.

We tested Groundstation and Personal Assistant integration in a series of demos in South Africa in 2023.


LTM Benchmark 


I've always believed that continual learning is a key characteristic of intelligent agents. However, LLMs inherently lack the capability for ongoing learning, as their learning is confined to the limited scope of their context or prompt. To address this, we've turned our focus to developing Long-Term Memory (LTM) systems that enhance LLMs.

But, before embarking on the development of an LTM system, it's crucial to understand the specific tasks it should address, along with methods for testing and benchmarking it. This ensures measurable progress.

To this end, we've created our own set of testing tasks for LTM systems, aiming for more than just retrieval-augmented generation. Our LTM system is designed to learn incrementally and continually, so our tests are tailored to reflect these capabilities.

We're excited to announce that we will soon be open-sourcing the GoodAI LTM Benchmark. This will allow everyone to test and compare their long-term memory solutions, fostering broader collaboration and advancement in this field.

LTM systems


Over the past year, we have dedicated ourselves to developing systems capable of augmenting LLMs by effectively expanding their context to an almost infinite scope.

One of these systems has been seamlessly integrated with our Charlie Mnemonic, enhancing its learning capabilities. Similarly, another system has been incorporated into our AI People game. This integration allows NPCs within the game to remember, learn from, and adapt to both the actions of the player and changes in the environment.


Oranžerie

We have completed the new garden redesign and hosted a 10-year anniversary party for Space Engineers and our community.


Plans for 2024

In this year, my primary goals are the following:

Keen Software House

We will continue developing Space Engineers and VRAGE3.

One of the updates will focus on PVE (encounters and exploration), but also on first-player experience.

We will keep growing our team, there are still some open positions.

If you are interested in news about our company, our games, our plans for the future and what comes after VRAGE3, please join our newsletter and get exclusive news and updates you can’t find anywhere else!

GoodAI

We plan to announce our AI People game to the public, and soon go into public alpha. It’s a very experimental concept with many challenges, but we stay strong in our determination. Wish us luck!

We will be releasing our LTM Benchmark, as an open source.

We will also be releasing Charlie Mnemonic, our personal assistant with long-term memory. The details of this release will be announced later.

We will continue developing our drone Groundstation and delivering it to customers, in order to prevent and fight crime. 

About our AGI development

Someone asked me if we have abandoned our AGI development.

The answer is a resounding no. I started GoodAI in 2015 and invested over $12M because I believed that automating the intelligence (making it cheap and powerful) would open doors to a much better and more interesting world. I still believe in this dream. There’s nothing more important than this.

For many years, I haven’t had much belief in deep learning. I thought those models couldn’t generalize or learn continually, so they were not the path to AGI. Therefore, in GoodAI, we worked on our own AGI architectures. 

However, in 2021/2022, while working on our AI People game (called AI Game back then), we realized that LLMs have excellent potential - if you use them as a reasoning engine inside a cognitive agent. Of course, it’s still not AGI, but you could start investigating what is missing and then designing around those limitations (e.g. adding a long-term memory learning system, improving the LLM planning and reasoning, adding more modalities, etc). 

The most important thing was: LLMs were here, they worked, there started to be a pretty strong ecosystem around them (academic and industrial), whereas, our AGI architectures were in their infancy, basically at TRL 2-3. We had to make a critical decision: (1) continue on our original path and risk that it would take another 20 years while LLMs would flourish, or (2) adapt. We chose the second path because I always try to be pragmatic and flexible.

Today, our mission at GoodAI is this:
  • Develop agents using LLMs, enhancing them with capabilities for continual learning, planning, reasoning, and tool utilization, all while prioritizing user experience.
  • Create commercial applications, such as the AI People game, Charlie Mnemonic (our personal assistant featuring long-term memory), and drone Groundstation. It's important to note that GoodAI's primary objective has always been to develop practical products, not solely to pursue scientific inquiry.
  • Build AI tools that not only empower individuals but are also universally accessible, thereby democratizing and distributing AI technology.
  • Continue to refine and advance the fundamentals of AGI.
  • For more info continue to: https://www.goodai.com/blog/

Personal

I have recently resumed training in Brazilian Jiu-Jitsu (BJJ), and I'm finding immense enjoyment in every session. For me, combat sports are particularly appealing because they demand a blend of tactical planning and spatial awareness, alongside physical strength and endurance. Also, I like a good fight.


Looking ahead, my primary focus will be on bringing our existing projects to completion and delivering them to our customers and the public. 

My predictions for 2024

These are my speculative forecasts for the year ahead, with no guarantees of accuracy.

  • GPT-5 released - 10x improvement in reasoning capabilities and precision of following longer and more complicated instructions. This will finally get us closer to LLM agents who can create and execute long-term plans reliably.
  • Revolution in Continual Learning - LLMs with long-term memory and the ability to continually learn and adapt will start to get more attention. Think about it as ChatGPT but with an infinite conversation window. You can incrementally teach it new skills, and it learns about you. You don’t have to repeat information. You can teach it active skills (how to do things, what you like, how to answer your questions, etc).
  • AI NPCs in games - I am a bit biased here (hehe) - but I think our AI People game will lead the way. I am not talking about plugging ChatGPT and voice interface into your game. I am talking about AI NPCs that can interact with the world, learn continually, have personalities, and generate a narrative that is engaging to the player.
  • AI Simulated games - This will be a natural next step for our AI People game. Think about GPT not just emulating the behaviors of the agents but also the entire game world - with its logic and state updates. You would not have to program the game. The GPT would determine what happens next in your game.
  • AI-generated video - a significant advancement in AI-generated videos, especially in achieving greater consistency and coherence in longer formats, overcoming previous challenges of maintaining frame-to-frame continuity. We will get 10+ minutes long AI-generated videos that are consistent.
  • New Hardware - I expect commercially available chips focused on cheap and fast inference of transformer models. Think about it like an ASIC board with frozen model weights. Imagine it like GPT-4 running on your phone but generating 1 million tokens per second and not draining your battery.
  • AI Robotics - we will see a moment similar to the ChatGPT release in 2023 - some major advancement in robotics (100x from today’s possibilities). Tesla Optimus humanoid robot is a good example; the trend of their advancements looks very good.
  • AI safety and alignment  - We will see if AI models with more powerful capabilities lead to increased alignment and safety. My bet is that it will actually be more accessible to align more powerful models to human values than it is to align weaker models. I don’t expect that the Orthogonality thesis will hold.
  • Impact on the job market - the trend of AI replacing some of the workers will continue, but on the other hand, AI assisting tools will keep opening the doors to new people, and the most exciting is that many more one-founder startups will emerge because one person with a great idea will be able to execute it very effectively if employing available AI assisting tools. This will lead to an explosion of new products and services.
  • Politics - it seems the world has been getting crazier in the last few years, with more extreme opinions and new wars. But my hope is that nothing worse than what is already happening will happen in 2024 and years leading to the technological singularity.

Hiring - Keen Software House

If you’re interested in working on awesome games like Space Engineers or our in-house engine VRAGE3, we’d love to hear from you! 

We are currently in the research and development phase, exploring and building new features of the VRAGE3 engine. Our goal is to support extremely large, fully dynamic, and destructible environments, including planets, large spaceships, and any other creations built by the players.

Check out the open positions at Keen Software House and don’t forget to send us your English CV/resume and cover letter.

Our team is global. Remote collaboration is possible!


Hiring - GoodAI

If you want to work on LLM-driven agents, long-term memory (LTM) systems, continual learning, AI NPCs, AI for drones, and think that AI should be available to everyone, consider joining our team.

Check out the open positions at GoodAI. Remote collaboration is possible!

We are especially looking for people with an interest in LLM agents and long-term memory (LTM). 


Remote Work

Our teams are global.

Finding the best candidates to join Keen Software House and GoodAI means exploring every possible solution, including remote work. 

While we strive to provide team members with the best possible work-life balance here in Prague at our incredible Oranzerie offices, we understand that it is not always possible to transition, therefore, we are very remote-friendly. 

Here’s a map of where our teammates live.


New Merchandise

We released new SE 10-year anniversary merchandise to commemorate this special occasion. 

You can check it out here.


For new Displate metal posters, continue here.


Follow our social media to get the latest news!

If you want to let me know your feedback, please get in touch via my personal email address marek.rosa@keenswh.com, or use our Keen Software House support site. I welcome all of the feedback we receive and we will use it to learn and provide better services to our players.


Thank you for reading this blog!

 

Best,
Marek Rosa
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI

 

For more news:
Space Engineers: www.SpaceEngineersGame.com
Keen Software House: www.keenswh.com
VRAGE Engine: www.keenswh.com/vrage/
GoodAI: www.GoodAI.com
Personal Blog: blog.marekrosa.org

 

Personal bio:

Marek Rosa is the founder and CEO of GoodAI, a general artificial intelligence R&D company, and Keen Software House, an independent game development studio, started in 2010, and best known for its best-seller Space Engineers (over 5 million copies sold). Space Engineers has the 4th largest Workshop on Steam with over 500K mods, ships, stations, worlds, and more!

Marek has been interested in game development and artificial intelligence since childhood. He started his career as a programmer and later transitioned to a leadership role. After the success of Keen Software House titles, Marek was able to fund GoodAI in 2014 with a $10 Million personal investment.

Both companies now have over 100 engineers, researchers, artists, and game developers.

Marek's primary focus includes Space Engineers, the VRAGE3 engine, the AI People game, long-term memory systems (LTM), an LLM-powered personal assistant with LTM named Charlie Mnemonic, and the Groundstation.

GoodAI's mission is to develop AGI - as fast as possible - to help humanity and understand the universe. One of the commercial stepping stones is the "AI People" game, which features LLM-driven AI NPCs. These NPCs are grounded in the game world, interacting dynamically with the game environment and with other NPCs, and they possess long-term memory and developing personalities. GoodAI also works on autonomous agents that can self-improve and solve any task that a human can.