Segment Anything Logo

Segment Anything


Cut out anything from your images with just one click using SAM from Meta AI.

Last Updated:

SAM from Meta AI is an AI model that can seamlessly "cut out" any object from an image with just a single click. It is a promotable segmentation system with zero-shot generalization, allowing the user to select and segment any object without additional training. The model is efficient and flexible, making it perfect for a wide range of segmentation tasks.

Meta AI presents SAM, a pioneering model that makes it easy for users to "cut out" any object in an image with just one click. SAM is a promotable segmentation system with zero-shot generalization, which means that it can select or segment any object without the need for additional training. Its innovative design allows for flexible integration with other systems, such as AR/VR headsets and bounding box prompts from an object detector that enables text-to-object segmentation, among others.

SAM uses a variety of input prompts, which specify what to segment in an image, allowing for a wide range of segmentation tasks without the need for additional training. Its interactive points and boxes prompt enable the system to segment everything in an image while generating multiple valid masks for ambiguous tasks. SAM’s prompt design enables it to be extended beyond its core functionality. This system can take input prompts from other systems, making it even more versatile.

SAM’s advanced capabilities are the result of its training on millions of images and masks collected through the use of a model-in-the-loop "data engine." This cycle was repeated many times to improve both the model and the dataset, leading to the final dataset of more than 1.1 billion segmentation masks collected on ~11 million licensed and privacy-preserving images. Its data engine makes it possible to annotate new images fully and automatically using SAM’s advanced capabilities. It presents SAM with grid points on an image and asks it to segment everything at each point.

SAM’s design is efficient enough to power its data engine. Its model is decoupled into a one-time image encoder and a lightweight mask decoder that can run in a web browser, and run efficiently on CPU or GPU across a variety of platforms that support ONNX runtime. Its prompt encoder and mask decoder can run directly with PyTroch or be converted to ONNX.

SAM makes it possible for the user to select a dog in an image and try the demo. The demo is an example of how the AI model's data engine powers its advanced capabilities. Foreground/background points, bounding box, mask, and text prompts are explored in the paper, but SAM's functionality is not limited to these prompts alone.

SAM's prompt design allows it to generate outputs that can be used for other AI systems. Object masks can be tracked in videos, enable imaging editing applications, lift to 3D, or used for creative tasks like collaging. SAM has learned a general notion of what objects are, enabling zero-shot generalization to unfamiliar objects and images without requiring additional training

SAM’s model was trained on our SA-1B dataset. It was trained for 3-5 days on 256 A100 GPUs, and currently only supports images or individual frames from videos. The code is available on GitHub.

Sign up on the website to receive our newsletter and keep up-to-date on the latest SAM updates, research breakthroughs, and events.