Convert Figma logo to code with AI

microsoft logocomputervision-recipes

Best Practices, code samples, and documentation for Computer Vision.

9,478
1,174
9,478
69

Top Related Projects

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

77,006

Models and examples built with TensorFlow

49,537

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

24,600

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

OpenMMLab Detection Toolbox and Benchmark

26,250

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Quick Overview

The microsoft/computervision-recipes repository is a comprehensive collection of best practices, code samples, and deep learning models for computer vision tasks. It provides a set of Jupyter notebooks and utility functions to help developers and researchers quickly prototype and deploy computer vision solutions using popular frameworks like PyTorch and TensorFlow.

Pros

  • Extensive collection of pre-built models and utilities for various computer vision tasks
  • Well-documented Jupyter notebooks with step-by-step explanations
  • Supports multiple deep learning frameworks (PyTorch and TensorFlow)
  • Regularly updated with new features and improvements

Cons

  • Large repository size may be overwhelming for beginners
  • Some advanced topics may require prior knowledge of computer vision concepts
  • Dependency on specific versions of libraries may cause compatibility issues
  • Limited support for edge devices or mobile platforms

Code Examples

  1. Image classification using a pre-trained model:
from cv_utils import load_image, predict_image

# Load an image
image = load_image("path/to/image.jpg")

# Predict the class of the image
class_name, confidence = predict_image(image, model="resnet50")

print(f"Predicted class: {class_name}, Confidence: {confidence:.2f}")
  1. Object detection with YOLO:
from cv_utils import load_image, detect_objects

# Load an image
image = load_image("path/to/image.jpg")

# Detect objects in the image
detections = detect_objects(image, model="yolov5")

for det in detections:
    print(f"Object: {det['class']}, Confidence: {det['confidence']:.2f}, Bbox: {det['bbox']}")
  1. Image segmentation using DeepLab:
from cv_utils import load_image, segment_image

# Load an image
image = load_image("path/to/image.jpg")

# Perform semantic segmentation
segmentation_map = segment_image(image, model="deeplabv3")

# Visualize the segmentation map
visualize_segmentation(segmentation_map)

Getting Started

  1. Clone the repository:

    git clone https://github.com/microsoft/computervision-recipes.git
    cd computervision-recipes
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Open and run Jupyter notebooks:

    jupyter notebook
    
  4. Navigate to the desired notebook (e.g., notebooks/01_classification/01_training_introduction.ipynb) and start exploring the computer vision recipes.

Competitor Comparisons

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

  • More comprehensive and advanced computer vision library, focusing on object detection and segmentation
  • Faster training and inference times due to optimized implementation
  • Extensive model zoo with pre-trained models for various tasks

Cons of Detectron2

  • Steeper learning curve, requiring more in-depth knowledge of computer vision concepts
  • Less beginner-friendly documentation compared to ComputerVision-Recipes
  • Primarily focused on PyTorch, limiting flexibility for users preferring other frameworks

Code Comparison

Detectron2 (object detection):

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
predictor = DefaultPredictor(cfg)

ComputerVision-Recipes (image classification):

from azureml.core import Workspace
from azureml.train.dnn import TensorFlow

ws = Workspace.from_config()
estimator = TensorFlow(source_directory=project_folder,
                       script_params=script_params,
                       compute_target=compute_target,
                       entry_script='train.py')
77,006

Models and examples built with TensorFlow

Pros of models

  • Broader scope covering various ML domains beyond computer vision
  • Larger community and more frequent updates
  • Official TensorFlow implementation, ensuring compatibility and optimization

Cons of models

  • Steeper learning curve due to its extensive codebase
  • Less focused on end-to-end solutions for specific computer vision tasks
  • May require more setup and configuration for specific use cases

Code Comparison

models:

import tensorflow as tf
from official.vision.image_classification import resnet_model

model = resnet_model.resnet50(num_classes=1000)

computervision-recipes:

from utils_cv.classification.model import get_model

model = get_model('resnet50', num_classes=1000, pretrained=True)

Summary

While models offers a comprehensive collection of TensorFlow implementations for various ML tasks, computervision-recipes provides more focused, end-to-end solutions for computer vision problems. The former benefits from a larger community and frequent updates, but may have a steeper learning curve. The latter offers simpler implementations for specific CV tasks but has a narrower scope. Choose based on your project requirements and familiarity with the frameworks.

49,537

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Pros of YOLOv5

  • Faster training and inference times
  • More up-to-date with recent advancements in object detection
  • Simpler architecture, making it easier to understand and modify

Cons of YOLOv5

  • More focused on a single task (object detection) compared to the broader scope of Computer Vision Recipes
  • Less comprehensive documentation and tutorials for beginners
  • Fewer pre-trained models for diverse computer vision tasks

Code Comparison

YOLOv5:

from ultralytics import YOLO

model = YOLO('yolov5s.pt')
results = model('image.jpg')
results.show()

Computer Vision Recipes:

from azureml.contrib.services.aml_request import rawhttp
from azureml.contrib.services.aml_response import AMLResponse

@rawhttp
def run(request):
    image = request.files["image"]
    # Process image using Computer Vision Recipes
    return AMLResponse(result, 200)

The YOLOv5 code is more straightforward for object detection tasks, while Computer Vision Recipes offers a more flexible framework for various computer vision applications.

24,600

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pros of Mask_RCNN

  • Focused specifically on instance segmentation, providing a deep implementation
  • Includes pre-trained models on the COCO dataset
  • Offers detailed documentation and examples for training and inference

Cons of Mask_RCNN

  • Limited to instance segmentation tasks, less versatile than computervision-recipes
  • May require more domain-specific knowledge to use effectively
  • Less actively maintained, with fewer recent updates

Code Comparison

Mask_RCNN:

import mrcnn.model as modellib
from mrcnn import utils

class InferenceConfig(coco.CocoConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

model = modellib.MaskRCNN(mode="inference", config=InferenceConfig(), model_dir=MODEL_DIR)

computervision-recipes:

from azureml.core import Workspace
from azureml.core.model import Model

ws = Workspace.from_config()
model = Model(ws, 'my_model')
model.download(target_dir=os.getcwd(), exist_ok=True)

OpenMMLab Detection Toolbox and Benchmark

Pros of mmdetection

  • More comprehensive and specialized for object detection tasks
  • Regularly updated with state-of-the-art algorithms and models
  • Extensive documentation and community support

Cons of mmdetection

  • Steeper learning curve for beginners
  • More focused on object detection, less versatile for other computer vision tasks

Code Comparison

mmdetection:

from mmdet.apis import init_detector, inference_detector

config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
result = inference_detector(model, 'test.jpg')

computervision-recipes:

from utils_cv.detection.plot import plot_detections
from utils_cv.detection.model import faster_rcnn

model = faster_rcnn(pretrained=True)
img = Image.open("test.jpg")
outputs = model(img)
plot_detections(img, outputs[0])

Both repositories offer valuable tools for computer vision tasks, with mmdetection focusing more on object detection and providing a wider range of models and algorithms. computervision-recipes offers a broader scope of computer vision applications but may not be as specialized or up-to-date for object detection specifically.

26,250

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Pros of Detectron

  • More focused on object detection and segmentation tasks
  • Provides pre-trained models for immediate use
  • Extensive documentation and examples for various use cases

Cons of Detectron

  • Limited to PyTorch framework
  • Steeper learning curve for beginners
  • Less comprehensive in covering other computer vision tasks

Code Comparison

Detectron2 (Detectron's successor) code snippet:

import detectron2
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file("path/to/config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

Computer Vision Recipes code snippet:

from azureml.core import Workspace
from azureml.core.model import Model

ws = Workspace.from_config()
model = Model(ws, 'my_model')
model.download(target_dir='.', exist_ok=True)

Detectron focuses on object detection configurations and predictions, while Computer Vision Recipes emphasizes Azure integration and model management. Detectron provides a more specialized toolkit for object detection tasks, whereas Computer Vision Recipes offers a broader range of computer vision solutions integrated with Azure services.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

+ Update July: Added support for action recognition and tracking
+              in the new release v1.2.

Computer Vision

In recent years, we've see an extra-ordinary growth in Computer Vision, with applications in face recognition, image understanding, search, drones, mapping, semi-autonomous and autonomous vehicles. A key part to many of these applications are visual recognition tasks such as image classification, object detection and image similarity.

This repository provides examples and best practice guidelines for building computer vision systems. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in Computer Vision algorithms, neural architectures, and operationalizing such systems. Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utility around loading image data, optimizing and evaluating models, and scaling up to the cloud. In addition, having worked in this space for many years, we aim to answer common questions, point out frequently observed pitfalls, and show how to use the cloud for training and deployment.

We hope that these examples and utilities can significantly reduce the “time to market” by simplifying the experience from defining the business problem to development of solution by orders of magnitude. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools in a wide variety of languages.

These examples are provided as Jupyter notebooks and common utility functions. All examples use PyTorch as the underlying deep learning library.

Examples

This repository supports various Computer Vision scenarios which either operate on a single image:

Some supported CV scenarios

As well as scenarios such as action recognition which take a video sequence as input:

Target Audience

Our target audience for this repository includes data scientists and machine learning engineers with varying levels of Computer Vision knowledge as our content is source-only and targets custom machine learning modelling. The utilities and examples provided are intended to be solution accelerators for real-world vision problems.

Getting Started

To get started, navigate to the Setup Guide, which lists instructions on how to setup the compute environment and dependencies needed to run the notebooks in this repo. Once your environment is setup, navigate to the Scenarios folder and start exploring the notebooks. We recommend to start with the image classification notebooks, since this introduces concepts which are also used by the other scenarios (e.g. pre-training on ImageNet).

Alternatively, we support Binder Binder which makes it easy to try one of our notebooks in a web-browser simply by following this link. However, Binder is free, and as a result only comes with limited CPU compute power and without GPU support. Expect the notebook to run very slowly (this is somewhat improved by reducing image resolution to e.g. 60 pixels but at the cost of low accuracies).

Scenarios

The following is a summary of commonly used Computer Vision scenarios that are covered in this repository. For each of the main scenarios ("base"), we provide the tools to effectively build your own model. This includes simple tasks such as fine-tuning your own model on your own data, to more complex tasks such as hard-negative mining and even model deployment.

ScenarioSupportDescription
ClassificationBaseImage Classification is a supervised machine learning technique to learn and predict the category of a given image.
SimilarityBaseImage Similarity is a way to compute a similarity score given a pair of images. Given an image, it allows you to identify the most similar image in a given dataset.
DetectionBaseObject Detection is a technique that allows you to detect the bounding box of an object within an image.
KeypointsBaseKeypoint detection can be used to detect specific points on an object. A pre-trained model is provided to detect body joints for human pose estimation.
SegmentationBaseImage Segmentation assigns a category to each pixel in an image.
Action recognitionBaseAction recognition to identify in video/webcam footage what actions are performed (e.g. "running", "opening a bottle") and at what respective start/end times. We also implemented the i3d implementation of action recognition that can be found under (contrib)[contrib].
TrackingBaseTracking allows to detect and track multiple objects in a video sequence over time.
Crowd countingContribCounting the number of people in low-crowd-density (e.g. less than 10 people) and high-crowd-density (e.g. thousands of people) scenarios.

We separate the supported CV scenarios into two locations: (i) base: code and notebooks within the "utils_cv" and "scenarios" folders which follow strict coding guidelines, are well tested and maintained; (ii) contrib: code and other assets within the "contrib" folder, mainly covering less common CV scenarios using bleeding edge state-of-the-art approaches. Code in "contrib" is not regularly tested or maintained.

Computer Vision on Azure

Note that for certain computer vision problems, you may not need to build your own models. Instead, pre-built or easily customizable solutions exist on Azure which do not require any custom coding or machine learning expertise. We strongly recommend evaluating if these can sufficiently solve your problem. If these solutions are not applicable, or the accuracy of these solutions is not sufficient, then resorting to more complex and time-consuming custom approaches may be necessary.

The following Microsoft services offer simple solutions to address common computer vision tasks:

  • Vision Services are a set of pre-trained REST APIs which can be called for image tagging, face recognition, OCR, video analytics, and more. These APIs work out of the box and require minimal expertise in machine learning, but have limited customization capabilities. See the various demos available to get a feel for the functionality (e.g. Computer Vision). The service can be used through API calls or through SDKs (available in .NET, Python, Java, Node and Go languages)

  • Custom Vision is a SaaS service to train and deploy a model as a REST API given a user-provided training set. All steps including image upload, annotation, and model deployment can be performed using an intuitive UI or through SDKs (available in .NEt, Python, Java, Node and Go languages). Training image classification or object detection models can be achieved with minimal machine learning expertise. The Custom Vision offers more flexibility than using the pre-trained cognitive services APIs, but requires the user to bring and annotate their own data.

If you need to train your own model, the following services and links provide additional information that is likely useful.

  • Azure Machine Learning service (AzureML) is a service that helps users accelerate the training and deploying of machine learning models. While not specific for computer vision workloads, the AzureML Python SDK can be used for scalable and reliable training and deployment of machine learning solutions to the cloud. We leverage Azure Machine Learning in several of the notebooks within this repository (e.g. deployment to Azure Kubernetes Service)

  • Azure AI Reference architectures provide a set of examples (backed by code) of how to build common AI-oriented workloads that leverage multiple cloud components. While not computer vision specific, these reference architectures cover several machine learning workloads such as model deployment or batch scoring.

Build Status

AzureML Testing

Build TypeBranchStatusBranchStatus
Linux GPUmasterBuild StatusstagingBuild Status
Linux CPUmasterBuild StatusstagingBuild Status
Notebook unit GPUmasterBuild StatusstagingBuild Status

Contributing

This project welcomes contributions and suggestions. Please see our contribution guidelines.