Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

6,749

497

6,749

111

View on GitHub

Top Related Projects

Mask2Former

2,774

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

PaddleDetection

13,543

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

mmtracking

3,753

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Quick Overview

Track-Anything is an open-source interactive tool for video object tracking and segmentation. It combines Segment Anything Model (SAM) with XMem to enable efficient and accurate tracking of any object in a video, with the ability to make corrections through user interactions.

Pros

Flexible and interactive: Allows users to select and track any object in a video
High-quality results: Combines state-of-the-art models for accurate segmentation and tracking
User-friendly interface: Provides a web-based GUI for easy interaction and visualization
Extensible: Can be integrated into other applications or workflows

Cons

Resource-intensive: Requires significant computational power, especially for high-resolution videos
Limited to 2D tracking: Does not support 3D object tracking or complex scene understanding
Dependency on pre-trained models: Performance may vary depending on the quality and diversity of training data
Potential for drift: May require manual corrections for long videos or complex scenes

Code Examples

# Initialize the tracker
tracker = TrackingAnything(model_path='path/to/model')

# Load a video and start tracking
video_path = 'path/to/video.mp4'
initial_mask = get_initial_mask()  # User-provided or automatically generated
tracker.track_object(video_path, initial_mask)

# Perform interactive correction
frame_number = 50
correction_mask = get_user_correction()  # User-provided correction mask
tracker.apply_correction(frame_number, correction_mask)

# Export tracking results
output_path = 'path/to/output'
tracker.export_results(output_path, format='video')

Getting Started

Clone the repository:

git clone https://github.com/gaomingqi/Track-Anything.git
cd Track-Anything

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained models:

wget https://github.com/gaomingqi/Track-Anything/releases/download/v0.1.0/sam_vit_h_4b8939.pth
wget https://github.com/gaomingqi/Track-Anything/releases/download/v0.1.0/XMem-s012.pth

Run the web interface:
```
python app.py
```
Open a web browser and navigate to http://localhost:7860 to start using Track-Anything.

Competitor Comparisons

Mask2Former

2,774

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Pros of Mask2Former

More versatile, supporting various segmentation tasks (panoptic, instance, semantic)
Better performance on standard benchmarks like COCO and Cityscapes
Stronger foundation model with extensive research backing

Cons of Mask2Former

More complex architecture, potentially harder to implement and fine-tune
Requires more computational resources for training and inference
Less focused on video tracking tasks compared to Track-Anything

Code Comparison

Mask2Former:

outputs = model(images)
pred_masks = outputs["pred_masks"].sigmoid()
pred_classes = outputs["pred_logits"].argmax(-1)

Track-Anything:

masks, logits, iou_predictions = model(image, points, labels)
masks = masks > model.mask_threshold

Summary

Mask2Former is a more comprehensive segmentation model suitable for various tasks, while Track-Anything is specifically designed for video object tracking. Mask2Former offers better performance on standard benchmarks but requires more resources. Track-Anything provides a simpler interface for video tracking tasks, making it more accessible for specific use cases.

PaddleDetection

13,543

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Pros of PaddleDetection

Comprehensive object detection toolkit with multiple algorithms and models
Extensive documentation and tutorials for easy adoption
Supports various deployment options (mobile, server, edge devices)

Cons of PaddleDetection

Steeper learning curve due to its extensive features
Primarily focused on object detection, less versatile for other tasks

Code Comparison

PaddleDetection:

from ppdet.core.workspace import create
from ppdet.engine import Trainer

model = create('YOLOv3')
trainer = Trainer(model=model, train_dataset=train_dataset)
trainer.train()

Track-Anything:

from track_anything import TrackingAnything

model = TrackingAnything(checkpoint='sam_vit_h_4b8939.pth')
model.track_anything(video_path='input.mp4', output_path='output.mp4')

PaddleDetection offers a more structured approach for training and deploying object detection models, while Track-Anything provides a simpler interface for video object tracking and segmentation. PaddleDetection is better suited for large-scale projects requiring customizable object detection, whereas Track-Anything excels in quick video analysis and tracking tasks.

mmtracking

3,753

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Pros of mmtracking

More comprehensive tracking framework with multiple algorithms
Better documentation and community support
Modular design allows for easier customization and extension

Cons of mmtracking

Steeper learning curve due to its complexity
May be overkill for simple tracking tasks
Requires more setup and configuration

Code Comparison

mmtracking:

from mmtrack.apis import init_model, inference_mot

config_file = 'configs/mot/deepsort/deepsort_faster-rcnn_fpn_4e_mot17-private-half.py'
checkpoint_file = 'checkpoints/deepsort_faster-rcnn_fpn_4e_mot17-private-half_20210517_001210-d94bac73.pth'

model = init_model(config_file, checkpoint_file, device='cuda:0')
result = inference_mot(model, video_path, frame_rate=30)

Track-Anything:

from track_anything import TrackingAnything

model = TrackingAnything(device='cuda')
model.track(video_path, output_path)

The code comparison shows that Track-Anything offers a simpler API for basic tracking tasks, while mmtracking provides more flexibility and control over the tracking process. mmtracking requires more configuration but allows for fine-tuning of various parameters and algorithms.

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

More comprehensive and versatile object detection framework
Backed by Facebook AI Research, ensuring regular updates and support
Extensive documentation and community resources

Cons of Detectron2

Steeper learning curve due to its complexity
Requires more computational resources for training and inference

Code Comparison

Track-Anything:

from track_anything import TrackAnything

tracker = TrackAnything()
tracker.track(video_path, output_path)

Detectron2:

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file("path/to/config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

Summary

Detectron2 is a more robust and feature-rich object detection framework, while Track-Anything focuses specifically on video object tracking. Detectron2 offers greater flexibility and customization options but requires more expertise to use effectively. Track-Anything provides a simpler interface for quick video tracking tasks but may lack advanced features for complex scenarios.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Track-Anything is a flexible and interactive tool for video object tracking and segmentation. It is developed upon Segment Anything, can specify anything to track and segment via user clicks only. During tracking, users can flexibly change the objects they wanna track or correct the region of interest if there are any ambiguities. These characteristics enable Track-Anything to be suitable for:

Video object tracking and segmentation with shot changes.
Visualized development and data annotation for video object tracking and segmentation.
Object-centric downstream video tasks, such as video inpainting and editing.

:rocket: Updates

2023/05/02: We uploaded tutorials in steps :world_map:. Check HERE for more details.
2023/04/29: We improved inpainting by decoupling GPU memory usage and video length. Now Track-Anything can inpaint videos with any length! :smiley_cat: Check HERE for our GPU memory requirements.
2023/04/25: We are delighted to introduce Caption-Anything :writing_hand:, an inventive project from our lab that combines the capabilities of Segment Anything, Visual Captioning, and ChatGPT.
2023/04/20: We deployed DEMO on Hugging Face :hugs:!
2023/04/14: We made Track-Anything public!

:world_map: Video Tutorials (Track-Anything Tutorials in Steps)

https://user-images.githubusercontent.com/30309970/234902447-a4c59718-fcfe-443a-bd18-2f3f775cfc13.mp4

:joystick: Example - Multiple Object Tracking and Segmentation (with XMem)

https://user-images.githubusercontent.com/39208339/233035206-0a151004-6461-4deb-b782-d1dbfe691493.mp4

:joystick: Example - Video Object Tracking and Segmentation with Shot Changes (with XMem)

https://user-images.githubusercontent.com/30309970/232848349-f5e29e71-2ea4-4529-ac9a-94b9ca1e7055.mp4

:joystick: Example - Video Inpainting (with E2FGVI)

https://user-images.githubusercontent.com/28050374/232959816-07f2826f-d267-4dda-8ae5-a5132173b8f4.mp4

:computer: Get Started

Linux & Windows

# Clone the repository:
git clone https://github.com/gaomingqi/Track-Anything.git
cd Track-Anything

# Install dependencies: 
pip install -r requirements.txt

# Run the Track-Anything gradio demo.
python app.py --device cuda:0
# python app.py --device cuda:0 --sam_model_type vit_b # for lower memory usage

:book: Citation

If you find this work useful for your research or applications, please cite using this BibTeX:

@misc{yang2023track,
      title={Track Anything: Segment Anything Meets Videos}, 
      author={Jinyu Yang and Mingqi Gao and Zhe Li and Shang Gao and Fangjing Wang and Feng Zheng},
      year={2023},
      eprint={2304.11968},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

:clap: Acknowledgements

The project is based on Segment Anything, XMem, and E2FGVI. Thanks for the authors for their efforts.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot