Convert Figma logo to code with AI

gaomingqi logoTrack-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

6,542
482
6,542
107

Top Related Projects

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Quick Overview

Track-Anything is an open-source interactive tool for video object tracking and segmentation. It combines Segment Anything Model (SAM) with XMem to enable efficient and accurate tracking of any object in a video, with the ability to make corrections through user interactions.

Pros

  • Flexible and interactive: Allows users to select and track any object in a video
  • High-quality results: Combines state-of-the-art models for accurate segmentation and tracking
  • User-friendly interface: Provides a web-based GUI for easy interaction and visualization
  • Extensible: Can be integrated into other applications or workflows

Cons

  • Resource-intensive: Requires significant computational power, especially for high-resolution videos
  • Limited to 2D tracking: Does not support 3D object tracking or complex scene understanding
  • Dependency on pre-trained models: Performance may vary depending on the quality and diversity of training data
  • Potential for drift: May require manual corrections for long videos or complex scenes

Code Examples

# Initialize the tracker
tracker = TrackingAnything(model_path='path/to/model')

# Load a video and start tracking
video_path = 'path/to/video.mp4'
initial_mask = get_initial_mask()  # User-provided or automatically generated
tracker.track_object(video_path, initial_mask)
# Perform interactive correction
frame_number = 50
correction_mask = get_user_correction()  # User-provided correction mask
tracker.apply_correction(frame_number, correction_mask)
# Export tracking results
output_path = 'path/to/output'
tracker.export_results(output_path, format='video')

Getting Started

  1. Clone the repository:

    git clone https://github.com/gaomingqi/Track-Anything.git
    cd Track-Anything
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained models:

    wget https://github.com/gaomingqi/Track-Anything/releases/download/v0.1.0/sam_vit_h_4b8939.pth
    wget https://github.com/gaomingqi/Track-Anything/releases/download/v0.1.0/XMem-s012.pth
    
  4. Run the web interface:

    python app.py
    
  5. Open a web browser and navigate to http://localhost:7860 to start using Track-Anything.

Competitor Comparisons

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Pros of Mask2Former

  • More versatile, supporting various segmentation tasks (panoptic, instance, semantic)
  • Better performance on standard benchmarks like COCO and Cityscapes
  • Stronger foundation model with extensive research backing

Cons of Mask2Former

  • More complex architecture, potentially harder to implement and fine-tune
  • Requires more computational resources for training and inference
  • Less focused on video tracking tasks compared to Track-Anything

Code Comparison

Mask2Former:

outputs = model(images)
pred_masks = outputs["pred_masks"].sigmoid()
pred_classes = outputs["pred_logits"].argmax(-1)

Track-Anything:

masks, logits, iou_predictions = model(image, points, labels)
masks = masks > model.mask_threshold

Summary

Mask2Former is a more comprehensive segmentation model suitable for various tasks, while Track-Anything is specifically designed for video object tracking. Mask2Former offers better performance on standard benchmarks but requires more resources. Track-Anything provides a simpler interface for video tracking tasks, making it more accessible for specific use cases.

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Pros of PaddleDetection

  • Comprehensive object detection toolkit with multiple algorithms and models
  • Extensive documentation and tutorials for easy adoption
  • Supports various deployment options (mobile, server, edge devices)

Cons of PaddleDetection

  • Steeper learning curve due to its extensive features
  • Primarily focused on object detection, less versatile for other tasks

Code Comparison

PaddleDetection:

from ppdet.core.workspace import create
from ppdet.engine import Trainer

model = create('YOLOv3')
trainer = Trainer(model=model, train_dataset=train_dataset)
trainer.train()

Track-Anything:

from track_anything import TrackingAnything

model = TrackingAnything(checkpoint='sam_vit_h_4b8939.pth')
model.track_anything(video_path='input.mp4', output_path='output.mp4')

PaddleDetection offers a more structured approach for training and deploying object detection models, while Track-Anything provides a simpler interface for video object tracking and segmentation. PaddleDetection is better suited for large-scale projects requiring customizable object detection, whereas Track-Anything excels in quick video analysis and tracking tasks.

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Pros of mmtracking

  • More comprehensive tracking framework with multiple algorithms
  • Better documentation and community support
  • Modular design allows for easier customization and extension

Cons of mmtracking

  • Steeper learning curve due to its complexity
  • May be overkill for simple tracking tasks
  • Requires more setup and configuration

Code Comparison

mmtracking:

from mmtrack.apis import init_model, inference_mot

config_file = 'configs/mot/deepsort/deepsort_faster-rcnn_fpn_4e_mot17-private-half.py'
checkpoint_file = 'checkpoints/deepsort_faster-rcnn_fpn_4e_mot17-private-half_20210517_001210-d94bac73.pth'

model = init_model(config_file, checkpoint_file, device='cuda:0')
result = inference_mot(model, video_path, frame_rate=30)

Track-Anything:

from track_anything import TrackingAnything

model = TrackingAnything(device='cuda')
model.track(video_path, output_path)

The code comparison shows that Track-Anything offers a simpler API for basic tracking tasks, while mmtracking provides more flexibility and control over the tracking process. mmtracking requires more configuration but allows for fine-tuning of various parameters and algorithms.

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

  • More comprehensive and versatile object detection framework
  • Backed by Facebook AI Research, ensuring regular updates and support
  • Extensive documentation and community resources

Cons of Detectron2

  • Steeper learning curve due to its complexity
  • Requires more computational resources for training and inference

Code Comparison

Track-Anything:

from track_anything import TrackAnything

tracker = TrackAnything()
tracker.track(video_path, output_path)

Detectron2:

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file("path/to/config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

Summary

Detectron2 is a more robust and feature-rich object detection framework, while Track-Anything focuses specifically on video object tracking. Detectron2 offers greater flexibility and customization options but requires more expertise to use effectively. Track-Anything provides a simpler interface for quick video tracking tasks but may lack advanced features for complex scenarios.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README


Track-Anything is a flexible and interactive tool for video object tracking and segmentation. It is developed upon Segment Anything, can specify anything to track and segment via user clicks only. During tracking, users can flexibly change the objects they wanna track or correct the region of interest if there are any ambiguities. These characteristics enable Track-Anything to be suitable for:

  • Video object tracking and segmentation with shot changes.
  • Visualized development and data annotation for video object tracking and segmentation.
  • Object-centric downstream video tasks, such as video inpainting and editing.

:rocket: Updates

  • 2023/05/02: We uploaded tutorials in steps :world_map:. Check HERE for more details.

  • 2023/04/29: We improved inpainting by decoupling GPU memory usage and video length. Now Track-Anything can inpaint videos with any length! :smiley_cat: Check HERE for our GPU memory requirements.

  • 2023/04/25: We are delighted to introduce Caption-Anything :writing_hand:, an inventive project from our lab that combines the capabilities of Segment Anything, Visual Captioning, and ChatGPT.

  • 2023/04/20: We deployed DEMO on Hugging Face :hugs:!

  • 2023/04/14: We made Track-Anything public!

:world_map: Video Tutorials (Track-Anything Tutorials in Steps)

https://user-images.githubusercontent.com/30309970/234902447-a4c59718-fcfe-443a-bd18-2f3f775cfc13.mp4


:joystick: Example - Multiple Object Tracking and Segmentation (with XMem)

https://user-images.githubusercontent.com/39208339/233035206-0a151004-6461-4deb-b782-d1dbfe691493.mp4


:joystick: Example - Video Object Tracking and Segmentation with Shot Changes (with XMem)

https://user-images.githubusercontent.com/30309970/232848349-f5e29e71-2ea4-4529-ac9a-94b9ca1e7055.mp4


:joystick: Example - Video Inpainting (with E2FGVI)

https://user-images.githubusercontent.com/28050374/232959816-07f2826f-d267-4dda-8ae5-a5132173b8f4.mp4

:computer: Get Started

Linux & Windows

# Clone the repository:
git clone https://github.com/gaomingqi/Track-Anything.git
cd Track-Anything

# Install dependencies: 
pip install -r requirements.txt

# Run the Track-Anything gradio demo.
python app.py --device cuda:0
# python app.py --device cuda:0 --sam_model_type vit_b # for lower memory usage

:book: Citation

If you find this work useful for your research or applications, please cite using this BibTeX:

@misc{yang2023track,
      title={Track Anything: Segment Anything Meets Videos}, 
      author={Jinyu Yang and Mingqi Gao and Zhe Li and Shang Gao and Fangjing Wang and Feng Zheng},
      year={2023},
      eprint={2304.11968},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

:clap: Acknowledgements

The project is based on Segment Anything, XMem, and E2FGVI. Thanks for the authors for their efforts.