Voyager Logo

Voyager

Free

Voyager: Unleashing the Power of Lifelong Learning in Minecraft

Last Updated:

Meet Voyager, the first lifelong learning agent in Minecraft that explores, learns, and discovers in a self-driven manner. Powered by Large Language Models, Voyager is a one-of-a-kind agent that continuously acquires skills, makes novel discoveries while traversing diverse terrains, and achieves exceptional proficiency in playing Minecraft.

Voyager is a novel approach to building generally capable embodied agents that continuously explore, plan, and develop new skills in open-ended worlds. Unlike classical approaches that employ reinforcement learning (RL) and imitation learning, Voyager employs Large Language Models (LLMs) to generate consistent action plans or executable policies. Voyager is formulated to harness the world knowledge encapsulated in pre-trained LLMs, suggesting a natural use of LLMs to generically act as teachers for lifelong learning. The Minecraft environment plays a critical role in this work, since unlike most other games studied in AI, Minecraft does not impose a predefined end goal or a fixed storyline but rather provides a unique playground with endless possibilities. Voyager's approach bypasses the need for model parameter access and explicit gradient-based training or fine-tuning. The agent consists of three key components: an automatic curriculum for open-ended exploration, a skill library for increasingly complex behaviors, and an iterative prompting mechanism that uses code as the action space.

 

Voyager's automatic curriculum takes into account the exploration progress and the agent's state to maximize exploration. The curriculum is generated by GPT-4 based on the overarching goal of "discovering as many diverse things as possible", and can be perceived as an in-context form of novelty search. The skill library, on the other hand, is where Voyager stores the increasingly complex behaviors it learns. Complex skills can be synthesized by composing simpler programs, which greatly expands Voyager's capabilities over time and alleviates catastrophic forgetting. Finally, the iterative prompting mechanism enables Voyager to generate executable code for embodied control, aiming to refine skills that are appropriate for the current context. In addition, the system enables Voyager to self-verify by asking the model to evaluate programs by providing the state of the agent and the task at hand. Empirical evaluation of Voyager on exploration performance, tech tree mastery, map coverage, and zero-shot generalization capability to novel tasks in a new world shows that Voyager outperforms state-of-the-art baselines.

 

Voyager's superiority is evident in its ability to consistently discover new items, unlock key tech tree milestones faster, and more efficiently navigate longer distances than its competing agents. Additionally, Voyager is the only agent that is able to generalize to new tasks in a new world with no fine-tuning. The paper concludes by observing that the skill library constructed from lifelong learning enhances Voyager's performance, and can even be utilized by other methods like AutoGPT to outperform baselines. Ultimately, the paper presents Voyager as a starting point to develop powerful generalist agents without tuning the model parameters while moving towards a larger vision of generalist agents that exhibit lifelong learning capabilities.