Convert Figma logo to code with AI

CASIA-IVA-Lab logoFastSAM

Fast Segment Anything

7,333
682
7,333
118

Top Related Projects

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

3,631

Segment Anything in High Quality [NeurIPS 2023]

This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Quick Overview

FastSAM (Fast Segment Anything Model) is an efficient implementation of the Segment Anything Model (SAM) for real-time segmentation tasks. It aims to achieve similar performance to the original SAM while significantly reducing computational requirements and inference time.

Pros

  • Faster inference speed compared to the original SAM
  • Reduced computational requirements, making it suitable for resource-constrained environments
  • Maintains comparable segmentation quality to the original SAM
  • Supports various prompts for segmentation, including points, boxes, and text

Cons

  • May have slightly lower accuracy in some cases compared to the original SAM
  • Limited documentation and examples for advanced use cases
  • Requires specific dependencies and setup, which may be challenging for beginners
  • Still in active development, so some features may be unstable or subject to change

Code Examples

  1. Installing FastSAM:
pip install fastsam
  1. Performing segmentation with a point prompt:
from fastsam import FastSAM, FastSAMPrompt

model = FastSAM('FastSAM-x.pt')
image = 'path/to/image.jpg'
everything_results = model(image, device='cuda')

prompt = FastSAMPrompt(image, everything_results)
ann = prompt.point_prompt(points=[[500, 375]], pointlabel=[1])
prompt.plot(annotations=ann, output='output.jpg')
  1. Segmenting with a box prompt:
box_prompt = [0, 0, 100, 100]  # [x1, y1, x2, y2]
ann = prompt.box_prompt(bbox=box_prompt)
prompt.plot(annotations=ann, output='box_output.jpg')
  1. Text-prompted segmentation:
text_prompt = "a cat"
ann = prompt.text_prompt(text=text_prompt)
prompt.plot(annotations=ann, output='text_output.jpg')

Getting Started

To get started with FastSAM:

  1. Install the required dependencies:

    pip install torch torchvision opencv-python matplotlib
    pip install git+https://github.com/CASIA-IVA-Lab/FastSAM.git
    
  2. Download the pre-trained model:

    wget https://github.com/CASIA-IVA-Lab/FastSAM/releases/download/v1.0/FastSAM.pt
    
  3. Run a simple segmentation:

    from fastsam import FastSAM, FastSAMPrompt
    
    model = FastSAM('FastSAM.pt')
    image = 'path/to/your/image.jpg'
    everything_results = model(image, device='cuda')
    prompt = FastSAMPrompt(image, everything_results)
    ann = prompt.everything_prompt()
    prompt.plot(annotations=ann, output='output.jpg')
    

This will perform segmentation on the input image and save the result as 'output.jpg'.

Competitor Comparisons

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Pros of segment-anything

  • More comprehensive and versatile for various segmentation tasks
  • Backed by Meta AI, potentially offering more resources and support
  • Larger model with higher accuracy in complex scenarios

Cons of segment-anything

  • Slower inference time compared to FastSAM
  • Requires more computational resources
  • Larger model size, which may be challenging for deployment in resource-constrained environments

Code Comparison

segment-anything:

from segment_anything import SamPredictor, sam_model_registry

sam = sam_model_registry["default"](checkpoint="sam_vit_h_4b8939.pth")
predictor = SamPredictor(sam)
predictor.set_image(image)
masks, _, _ = predictor.predict(point_coords=input_point, point_labels=input_label)

FastSAM:

from fastsam import FastSAM, FastSAMPrompt

model = FastSAM('FastSAM-x.pt')
everything_results = model(image, device='cuda')
prompt = FastSAMPrompt(image, everything_results)
ann = prompt.everything()

Both repositories offer powerful segmentation capabilities, but FastSAM focuses on speed and efficiency, while segment-anything provides a more comprehensive solution at the cost of increased computational requirements.

3,631

Segment Anything in High Quality [NeurIPS 2023]

Pros of sam-hq

  • Higher quality segmentation masks with more precise boundaries
  • Better performance on high-resolution images
  • Improved handling of fine details and complex object structures

Cons of sam-hq

  • Slower inference time compared to FastSAM
  • Requires more computational resources
  • May struggle with real-time applications due to increased processing time

Code Comparison

FastSAM:

from fastsam import FastSAM, FastSAMPrompt

model = FastSAM('FastSAM-x.pt')
everything_results = model(image, device='cuda')
prompt_process = FastSAMPrompt(image, everything_results, device='cuda')

sam-hq:

from segment_anything import sam_model_registry, SamPredictor

sam = sam_model_registry["vit_h"](checkpoint="sam_hq_vit_h.pth")
predictor = SamPredictor(sam)
predictor.set_image(image)
masks, _, _ = predictor.predict(point_coords, point_labels)

Both repositories offer powerful segmentation capabilities, but they cater to different use cases. FastSAM prioritizes speed and efficiency, making it suitable for real-time applications. On the other hand, sam-hq focuses on high-quality segmentation results, particularly for high-resolution images and complex scenes, at the cost of increased computational requirements and processing time.

This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!

Pros of MobileSAM

  • Optimized for mobile devices, offering faster inference on resource-constrained platforms
  • Smaller model size, making it more suitable for edge computing and mobile applications
  • Maintains comparable accuracy to the original SAM model despite its reduced size

Cons of MobileSAM

  • May have slightly lower performance on complex scenes compared to FastSAM
  • Limited to image segmentation tasks, while FastSAM offers additional features like object detection

Code Comparison

MobileSAM:

from mobile_sam import SamPredictor, SamAutomaticMaskGenerator, sam_model_registry
sam = sam_model_registry["vit_t"](checkpoint="mobile_sam.pt")
mask_generator = SamAutomaticMaskGenerator(sam)
masks = mask_generator.generate(image)

FastSAM:

from fastsam import FastSAM, FastSAMPrompt
model = FastSAM('FastSAM-x.pt')
everything_results = model(image, device='cuda')
prompt = FastSAMPrompt(image, everything_results)
masks = prompt.everything()

Both repositories provide efficient implementations of Segment Anything Model (SAM) variants, but they target different use cases. MobileSAM focuses on mobile and edge deployment, while FastSAM aims for faster inference on more powerful hardware. The code snippets demonstrate the similar usage patterns, with minor differences in initialization and mask generation.

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Pros of EfficientSAM

  • Faster inference speed, especially on edge devices
  • Smaller model size, making it more suitable for deployment in resource-constrained environments
  • Improved efficiency without significant loss in accuracy

Cons of EfficientSAM

  • May have slightly lower accuracy compared to FastSAM in some scenarios
  • Less extensive documentation and community support
  • Fewer pre-trained models available for different use cases

Code Comparison

FastSAM:

from fastsam import FastSAM, FastSAMPrompt

model = FastSAM('FastSAM-x.pt')
everything_results = model(image, device='cuda')
prompt_process = FastSAMPrompt(image, everything_results, device='cuda')

EfficientSAM:

from efficientsam import EfficientSAM, EfficientSAMPrompt

model = EfficientSAM('EfficientSAM-s.pt')
results = model(image, device='cuda')
prompt = EfficientSAMPrompt(image, results, device='cuda')

Both repositories provide similar APIs for model initialization and inference, with minor differences in naming conventions and specific implementation details.

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Pros of Track-Anything

  • Focuses on video object tracking and segmentation
  • Provides interactive tools for manual corrections
  • Supports multiple object tracking simultaneously

Cons of Track-Anything

  • May require more computational resources for video processing
  • Potentially slower processing speed due to complex tracking algorithms
  • Limited to video-based applications

Code Comparison

Track-Anything:

from track_anything import TrackingAnything

tracker = TrackingAnything()
tracker.track(video_path, initial_mask)

FastSAM:

from fastsam import FastSAM, FastSAMPrompt

model = FastSAM('FastSAM-x.pt')
everything_results = model(image, device='cuda')
prompt = FastSAMPrompt(image, everything_results)

Track-Anything is designed for video object tracking and segmentation, offering interactive tools and multi-object tracking capabilities. However, it may require more computational resources and have slower processing speeds compared to FastSAM. FastSAM, on the other hand, focuses on fast segmentation of images, potentially offering quicker results but lacking video-specific features. The code snippets demonstrate the different approaches, with Track-Anything emphasizing video tracking and FastSAM focusing on image segmentation.

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Pros of Grounded-Segment-Anything

  • Offers more precise object segmentation with text prompts
  • Integrates multiple models for enhanced performance
  • Supports a wider range of applications, including visual question answering

Cons of Grounded-Segment-Anything

  • Requires more computational resources due to multiple models
  • Longer processing time compared to FastSAM
  • More complex setup and usage

Code Comparison

FastSAM:

from fastsam import FastSAM, FastSAMPrompt

model = FastSAM('FastSAM-x.pt')
image = cv2.imread('path/to/image.jpg')
everything_results = model(image, device='cuda')
prompt_process = FastSAMPrompt(image, everything_results)

Grounded-Segment-Anything:

from segment_anything import sam_model_registry, SamPredictor
from groundingdino.util.inference import load_model, load_image, predict

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
predictor = SamPredictor(sam)
grounding_dino_model = load_model("groundingdino_swint_ogc.pth")
image, image_source = load_image("path/to/image.jpg")

Both repositories offer powerful segmentation capabilities, but Grounded-Segment-Anything provides more flexibility and precision at the cost of increased complexity and resource requirements. FastSAM focuses on speed and efficiency, making it more suitable for real-time applications with limited computational resources.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Fast Segment Anything

[📕Paper] [🤗HuggingFace Demo] [Colab demo] [Replicate demo & API] [OpenXLab Demo] [Model Zoo] [BibTeX] [Video Demo]

FastSAM Speed

The Fast Segment Anything Model(FastSAM) is a CNN Segment Anything Model trained using only 2% of the SA-1B dataset published by SAM authors. FastSAM achieves comparable performance with the SAM method at 50× higher run-time speed.

FastSAM design

🍇 Updates

Installation

Clone the repository locally:

git clone https://github.com/CASIA-IVA-Lab/FastSAM.git

Create the conda env. The code requires python>=3.7, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

conda create -n FastSAM python=3.9
conda activate FastSAM

Install the packages:

cd FastSAM
pip install -r requirements.txt

Install CLIP(Required if the text prompt is being tested.):

pip install git+https://github.com/openai/CLIP.git

Getting Started

First download a model checkpoint.

Then, you can run the scripts to try the everything mode and three prompt modes.

# Everything mode
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg
# Text prompt
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg  --text_prompt "the yellow dog"
# Box prompt (xywh)
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg --box_prompt "[[570,200,230,400]]"
# Points prompt
python Inference.py --model_path ./weights/FastSAM.pt --img_path ./images/dogs.jpg  --point_prompt "[[520,360],[620,300]]" --point_label "[1,0]"

You can use the following code to generate all masks and visualize the results.

from fastsam import FastSAM, FastSAMPrompt

model = FastSAM('./weights/FastSAM.pt')
IMAGE_PATH = './images/dogs.jpg'
DEVICE = 'cpu'
everything_results = model(IMAGE_PATH, device=DEVICE, retina_masks=True, imgsz=1024, conf=0.4, iou=0.9,)
prompt_process = FastSAMPrompt(IMAGE_PATH, everything_results, device=DEVICE)

# everything prompt
ann = prompt_process.everything_prompt()

prompt_process.plot(annotations=ann,output_path='./output/dog.jpg',)

For point/box/text mode prompts, use:

# bbox default shape [0,0,0,0] -> [x1,y1,x2,y2]
ann = prompt_process.box_prompt(bboxes=[[200, 200, 300, 300]])

# text prompt
ann = prompt_process.text_prompt(text='a photo of a dog')

# point prompt
# points default [[0,0]] [[x1,y1],[x2,y2]]
# point_label default [0] [1,0] 0:background, 1:foreground
ann = prompt_process.point_prompt(points=[[620, 360]], pointlabel=[1])

prompt_process.plot(annotations=ann,output_path='./output/dog.jpg',)

You are also welcomed to try our Colab demo: FastSAM_example.ipynb.

Different Inference Options

We provide various options for different purposes, details are in MORE_USAGES.md.

Training or Validation

Training from scratch or validation: Training and Validation Code.

Web demo

Gradio demo

  • We also provide a UI for testing our method that is built with gradio. You can upload a custom image, select the mode and set the parameters, click the segment button, and get a satisfactory segmentation result. Currently, the UI supports interaction with the 'Everything mode' and 'points mode'. We plan to add support for additional modes in the future. Running the following command in a terminal will launch the demo:
# Download the pre-trained model in "./weights/FastSAM.pt"
python app_gradio.py

HF_Everyhting HF_Points

Replicate demo

  • Replicate demo has supported all modes, you can experience points/box/text mode.

Replicate-1 Replicate-2 Replicate-3

Model Checkpoints

Two model versions of the model are available with different sizes. Click the links below to download the checkpoint for the corresponding model type.

Results

All result were tested on a single NVIDIA GeForce RTX 3090.

1. Inference time

Running Speed under Different Point Prompt Numbers(ms).

methodparams110100E(16x16)E(32x32*)E(64x64)
SAM-H0.6G44646462785220996972
SAM-B136M11012523043213835417
FastSAM68M404040404040

2. Memory usage

DatasetMethodGPU Memory (MB)
COCO 2017FastSAM2608
COCO 2017SAM-H7060
COCO 2017SAM-B4670

3. Zero-shot Transfer Experiments

Edge Detection

Test on the BSDB500 dataset.

methodyearODSOISAPR50
HED2015.788.808.840.923
SAM2023.768.786.794.928
FastSAM2023.750.790.793.903

Object Proposals

COCO
methodAR10AR100AR1000AUC
SAM-H E6415.545.667.732.1
SAM-H E3218.549.562.533.7
SAM-B E3211.439.659.127.3
FastSAM15.747.363.732.2
LVIS

bbox AR@1000

methodallsmallmed.large
ViTDet-H65.053.283.391.2
zero-shot transfer methods
SAM-H E6452.136.675.188.2
SAM-H E3250.333.176.289.8
SAM-B E3245.029.368.780.6
FastSAM57.144.377.185.3

Instance Segmentation On COCO 2017

methodAPAPSAPMAPL
ViTDet-H.510.320.543.689
SAM.465.308.510.617
FastSAM.379.239.434.500

4. Performance Visualization

Several segmentation results:

Natural Images

Natural Images

Text to Mask

Text to Mask

5.Downstream tasks

The results of several downstream tasks to show the effectiveness.

Anomaly Detection

Anomaly Detection

Salient Object Detection

Salient Object Detection

Building Extracting

Building Detection

License

The model is licensed under the Apache 2.0 license.

Acknowledgement

Contributors

Our project wouldn't be possible without the contributions of these amazing people! Thank you all for making this project better.

Citing FastSAM

If you find this project useful for your research, please consider citing the following BibTeX entry.

@misc{zhao2023fast,
      title={Fast Segment Anything},
      author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
      year={2023},
      eprint={2306.12156},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Star History Chart