SUMMARY:
- Question: Why are we studying social learning in multi-agent systems?
- Answer: Multi-agent systems are made up of agents where each has its own objective. We believe that this leads to learning dynamics that are impossible in centralized systems.
This blog post is one of my personal takeaways from the Badger Seminar 2021.
Definitions:
- Monolithic system (centralized): has one objective that is shared by all parts of the system. Let’s assume a homogeneous network and learning via a back-prop algorithm. This is what most deep learning is about.
- Multi-agent system (decentralized): each part of the system has its own objective, therefore its learning resembles social learning in an evolving population of agents. We are not assuming back-prop learning.
Example of collective learning in a multi-agent system
An ant colony simulation where ants are stateless policies, moving randomly until they discover food, then start laying down pheromones (memories). If other ants discover pheromone trails, they start following them and adding more pheromones.
Summary: We can see that individual ants don’t know where the food is, but the collective of ants know where the food is and how to gather it.
Monolithic vs Multi-agent
The following table breaks down my intuitions. The hypotheses need to be verified experimentally.
Monolithic (centralized) |
Multi-agent (decentralized, social learning) |
---|---|
Global objective. Centralized feedback. |
Local objectives (don’t need to be aligned). Diverse objectives. Decentralized and local feedbacks. |
|
Could be a solution to “reward hacking” - because even if you hack your reward, you can’t hack the reward of other agents and their opinions on what your objectives should be |
|
Self-invented objectives possible. We want objectives (including the global one) to emerge, not to be predefined. |
|
Emergent building blocks. |
|
Progress within the system is induced by selection (competition), which is caused by limited resources (bottleneck, constraints). |
Fixed credit mechanism. | Learned credit / feedback mechanism. |
Serial learning. | Parallel learning. |
Cost of communication doesn’t scale effectively in all-to-all networks. | Cost of communication doesn’t need to grow dramatically if we introduce modularization and hierarchical / heterarchical networking. |
Every skill has to be learned individually, even if it’s needed in multiple places within the agent. | A learned skill can replicate within the society. |
|
This one isn’t clear and maybe I am completely wrong - but somehow I feel that the society of agents will consume less storage than one monolithic system, especially if they learn to reuse skills in modular fashion. The opposite is also quite possible 😉 and then the answer is “what if storage and parallel execution aren’t our bottlenecks, but time is?” |
Fixed learning procedure. | Learned learning procedures. |
Open-endedness is not possible because learning converges. Converging to one solution. |
Open-endedness is possible because learning diverges. Interactions between parts of the system. Parts of the system are liberated from other parts. We can get diverse solutions, unlike in monolithic systems, where the training converges to just one solution. |
Homogenous learning policy. | Heterogeneous learning policy. |
Upper bound Learning of a monolithic system may be faster and/or leading to better task performance only up to a certain threshold (task complexity, number of agent’s parameters necessary for the task, etc). After this threshold, a transition to social learning will be necessary, otherwise no further task performance improvements are possible, or learning will take unreasonable time, or the adaptation to new tasks won’t be possible at all. |
No upper bound. |
|
Dynamic topologies. Different topology for learning and inference passes. |
New skills are learned only during the global backprop pass. |
Horizontal and vertical transfer of learned skills. Replication is easier. |
Fully connected systems are slower learners if there’s a cost for connections. | |
|
Collective will generalize better to novel tasks than monolithic would. It can sacrifice parts to specialized tasks and keep the rest to stay better at adapting. |
Conclusion
Emergent learning is a type of learning that can only happen on the collective level when multiple individuals are interacting and qualitatively new behaviors can emerge.
The main difference between monolithic and multi-agent systems: the latter is made from agents where each has local objectives (not a global objective) and this leads to interesting evolutionary dynamics.
Questions for reflection
Key points of social learning:
- External information storage - Is it the key for better collective learning? The storage can be cumulative and bigger than the memory of an individual agent.
- Multiple feedback mechanisms - A social system can have many adaptive feedback mechanisms, will they scale better than a centralized one in monolithic systems?
- Efficiency threshold - Is there a threshold at which social systems become more efficient than monolithic systems?
Identified benefits of social systems:
- Better scaling - does a modular / hierarchical system with mostly local communication scale better than a monolithic system?
- Replication of skills - discovered skills can be replicated to other parts of the society, whereas in a monolithic system it needs to be rediscovered. Are there some counterexamples?
- Open-ended learning - due to not having a single fixed feedback mechanism and the learning diverges, the social systems are more suitable for open-ended learning.
General:
- Does a monolithic system learn faster than a multi-agent system?
- Is there a limit where a monolithic system won’t be sufficient anymore and you need to switch to multi-agent learning? Can we get open-ended learning inside a monolithic system?
- Can we simulate a multi-agent system on a monolithic system ? For example, a multi-agent system being simulated by a monolithic interpreter.
Thank you for reading this blog!
Best,
Marek Rosa
CEO, Creative Director, Founder at Keen Software House
CEO, CTO, Founder at GoodAI
For more news:
Space Engineers: www.SpaceEngineersGame.com
Keen Software House: www.keenswh.com
Medieval Engineers: www.MedievalEngineers.com
GoodAI: www.GoodAI.com
General AI Challenge: www.General-AI-Challenge.org
Personal bio:
Marek
Rosa is the CEO and CTO of GoodAI, a general artificial intelligence
R&D company, and the CEO and founder of Keen Software House, an
independent game development studio best known for its best-seller Space
Engineers (4 million copies sold). Both companies are based in Prague,
Czech Republic.
Marek has been interested in artificial intelligence since childhood. He started his career as a programmer but later transitioned to a leadership role. After the success of the Keen Software House titles, Marek was able to personally fund GoodAI, his new general AI research company building human-level artificial intelligence.
GoodAI was founded in January 2014, with a $10 Million investment from Marek, it now has over 30 research scientists, engineers, and consultants working across its divisions.
At this time, Marek is developing Space Engineers, as well as leading daily research and development on recursive self-improvement based general AI architecture - Badger.
No comments:
Post a Comment