We present Megaverse, a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of our engine enables physics-based simulation with high-dimensional egocentric observations at more than 1,000,000 actions per second on a single 8-GPU node. Megaverse is up to 70x faster than DeepMind Lab in fully-shaded 3D scenes with interactive objects. We achieve this high simulation performance by leveraging batched simulation, thereby taking full advantage of the massive parallelism of modern GPUs. We use Megaverse to build a new benchmark that consists of several single-agent and multi-agent tasks covering a variety of cognitive challenges. We evaluate model-free RL on this benchmark to provide baselines and facilitate future research.
The source code is available at https://github.com/alex-petrenko/megaverse
RL Agents in Megaverse-8 TowerBuilding
In this video, our APPO+CPC|A agent constructs a 10-level structure using interactive blocks in the TowerBuilding environment. The agent is rewarded for placing the objects high in the building zone, which is marked with a black rectangle. The TowerBuilding agent was trained on 10 billion environment transitions, or 21 years of experience at a human pace, on a single GPU.
During training, the maximum number of building blocks available to our APPO agent does not exceed 60, which is enough to build towers up to 10 levels high. The video on the left shows the performance of the agent provided with 110 building blocks. Even though this situation never occurred during training, the agent manages to consistently construct structures up to 14 levels high. This performance demonstrates generalization beyond the training task distribution. A general construction skill emerged through unassisted model-free end-to-end training.
RL Agents in other scenarios
Our agents demonstrate non-trivial performance in other Megaverse-8 scenarios, such as Collect, ObstaclesEasy, and HexExplore. These environments mostly require proficiency in 3D navigation and episodic exploration.
In the Collect scenario, our agent succeeds approximately 70% of the time, although in ~1/3 of the episodes it fails to find hidden green diamonds. In ObstaclesEasy the agents reach the end of the obstacle course up to 80% of the time. They learn how to cross lava lakes by jumping off blocks, although they never manage to traverse walls that are taller than two units. In HexExplore our agents succeed in finding the target object ~70% of the time. In many failed episodes the agents tend to forget where they were before and walk around in circles instead of exploring new parts of the maze.
The videos below demonstrate selected scenarios from the VoxelWorld-8 benchmark in the first person and overview perspectives. The overview videos additionally show the capabilities of the procedural layout generator.