Convert Figma logo to code with AI

google-ai-edge logomediapipe

Cross-platform, customizable ML solutions for live and streaming media.

26,729
5,079
26,729
462

Top Related Projects

26,715

Cross-platform, customizable ML solutions for live and streaming media.

18,355

A WebGL accelerated JavaScript library for training and deploying ML models.

77,862

Open Source Computer Vision Library

30,786

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

49,537

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Quick Overview

MediaPipe is an open-source framework developed by Google for building multimodal (video, audio, and sensor data) applied machine learning pipelines. It provides a set of pre-built solutions for various tasks such as face detection, pose estimation, and hand tracking, which can be easily integrated into mobile, web, and IoT applications.

Pros

  • Cross-platform compatibility (Android, iOS, web, desktop)
  • Optimized for real-time performance on edge devices
  • Extensive collection of pre-built ML solutions
  • Easy integration with popular ML frameworks like TensorFlow

Cons

  • Steep learning curve for beginners
  • Limited customization options for some pre-built solutions
  • Documentation can be inconsistent or outdated in some areas
  • Dependency management can be complex for certain platforms

Code Examples

  1. Face detection using MediaPipe:
import cv2
import mediapipe as mp

mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils

with mp_face_detection.FaceDetection(min_detection_confidence=0.5) as face_detection:
    image = cv2.imread('image.jpg')
    results = face_detection.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    
    if results.detections:
        for detection in results.detections:
            mp_drawing.draw_detection(image, detection)
    
    cv2.imshow('Face Detection', image)
    cv2.waitKey(0)
  1. Hand tracking using MediaPipe:
import cv2
import mediapipe as mp

mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils

cap = cv2.VideoCapture(0)
with mp_hands.Hands(min_detection_confidence=0.5, min_tracking_confidence=0.5) as hands:
    while cap.isOpened():
        success, image = cap.read()
        if not success:
            break
        
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = hands.process(image)
        
        if results.multi_hand_landmarks:
            for hand_landmarks in results.multi_hand_landmarks:
                mp_drawing.draw_landmarks(image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
        
        cv2.imshow('Hand Tracking', cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
        if cv2.waitKey(5) & 0xFF == 27:
            break

cap.release()
  1. Pose estimation using MediaPipe:
import cv2
import mediapipe as mp

mp_pose = mp.solutions.pose
mp_drawing = mp.solutions.drawing_utils

with mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5) as pose:
    image = cv2.imread('pose.jpg')
    results = pose.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    
    if results.pose_landmarks:
        mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
    
    cv2.imshow('Pose Estimation', image)
    cv2.waitKey(0)

Getting Started

To get started with MediaPipe, follow these steps:

  1. Install MediaPipe:
pip install mediapipe
  1. Import the necessary modules:
import mediapipe as mp
import cv2
  1. Choose a solution (e.g., face detection, hand tracking, pose estimation) and initialize it:
mp_face_detection = mp.solutions.face_detection
face_detection = mp_face_detection.FaceDetection(min_detection_confidence=0.5)
  1. Process images or video frames using the chosen solution:
image = cv2.imread('image.jpg')
results = face_detection.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

Competitor Comparisons

26,715

Cross-platform, customizable ML solutions for live and streaming media.

Pros of MediaPipe

  • Comprehensive cross-platform solution for building multimodal ML pipelines
  • Extensive library of pre-built components for various ML tasks
  • Optimized for real-time performance on mobile and edge devices

Cons of MediaPipe

  • Steeper learning curve due to complex architecture
  • Limited customization options for some pre-built components
  • Larger binary size compared to more specialized libraries

Code Comparison

MediaPipe example:

import mediapipe as mp

mp_hands = mp.solutions.hands
hands = mp_hands.Hands()
results = hands.process(image)
if results.multi_hand_landmarks:
    # Process hand landmarks

Note: As the comparison is between the same repository (google-ai-edge/mediapipe), there isn't a meaningful code comparison to be made. The code example provided is a typical usage of MediaPipe for hand detection.

Summary

MediaPipe is a powerful and versatile framework for building machine learning pipelines, particularly suited for real-time applications on mobile and edge devices. While it offers a wide range of pre-built components and optimized performance, it may require more effort to learn and integrate compared to simpler, more specialized libraries. The trade-off between its comprehensive features and complexity makes it an excellent choice for projects that require advanced multimodal ML capabilities and cross-platform support.

18,355

A WebGL accelerated JavaScript library for training and deploying ML models.

Pros of TensorFlow.js

  • Runs machine learning models directly in the browser or Node.js
  • Supports a wide range of pre-trained models and custom model creation
  • Extensive documentation and community support

Cons of TensorFlow.js

  • Performance may be slower compared to native implementations
  • Limited support for mobile devices and edge computing
  • Larger file size and longer initial load times for web applications

Code Comparison

MediaPipe (C++):

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"

class HandLandmarkCalculator : public CalculatorBase {
 public:
  static ::mediapipe::Status GetContract(CalculatorContract* cc);
};

TensorFlow.js (JavaScript):

import * as tf from '@tensorflow/tfjs';

const model = await tf.loadLayersModel('path/to/model.json');
const prediction = model.predict(tf.tensor2d([[1, 2, 3, 4]]));
console.log(prediction.dataSync());

MediaPipe focuses on efficient, cross-platform ML solutions for mobile and edge devices, while TensorFlow.js prioritizes running ML models in web browsers and JavaScript environments. MediaPipe offers optimized C++ implementations, whereas TensorFlow.js provides a more accessible JavaScript API for web developers.

77,862

Open Source Computer Vision Library

Pros of OpenCV

  • Broader scope and functionality, covering a wide range of computer vision tasks
  • Larger community and more extensive documentation
  • Better performance for traditional computer vision algorithms

Cons of OpenCV

  • Steeper learning curve for beginners
  • Less focus on mobile and edge devices
  • Heavier resource requirements for some applications

Code Comparison

MediaPipe example (Python):

import mediapipe as mp

mp_hands = mp.solutions.hands
hands = mp_hands.Hands()
results = hands.process(image)
if results.multi_hand_landmarks:
    # Process hand landmarks

OpenCV example (Python):

import cv2

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.1, 4)
for (x, y, w, h) in faces:
    cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)

MediaPipe focuses on specific tasks like hand tracking, while OpenCV provides a more general-purpose toolkit for computer vision. MediaPipe offers easier implementation for certain AI-powered tasks, while OpenCV excels in traditional computer vision algorithms and offers more flexibility for custom solutions.

30,786

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Pros of OpenPose

  • Higher accuracy in multi-person pose estimation
  • More robust in complex scenarios with occlusions
  • Supports 3D pose estimation

Cons of OpenPose

  • Slower processing speed, especially on CPU
  • Higher computational requirements
  • Less suitable for real-time applications on mobile devices

Code Comparison

OpenPose:

#include <openpose/pose/poseExtractor.hpp>

auto poseExtractor = op::PoseExtractorCaffe::getInstance(poseModel, netInputSize, outputSize, keypointScaleMode, num_gpu_start);
poseExtractor->forwardPass(netInputArray, imageSize, scaleInputToNetInputs);

MediaPipe:

import mediapipe as mp

mp_pose = mp.solutions.pose
pose = mp_pose.Pose(static_image_mode=False, min_detection_confidence=0.5)
results = pose.process(image)

OpenPose offers more detailed configuration options and is implemented in C++, while MediaPipe provides a simpler Python API. OpenPose is better suited for high-accuracy research applications, whereas MediaPipe is more appropriate for real-time, mobile, and web-based applications due to its lighter computational footprint and easier integration.

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

  • More comprehensive and flexible object detection framework
  • Better suited for advanced research and custom model development
  • Extensive documentation and community support

Cons of Detectron2

  • Steeper learning curve and more complex setup
  • Higher computational requirements for training and inference
  • Less optimized for mobile and edge devices

Code Comparison

MediaPipe example:

import mediapipe as mp

mp_face_detection = mp.solutions.face_detection
face_detection = mp_face_detection.FaceDetection()
results = face_detection.process(image)

Detectron2 example:

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

MediaPipe is more straightforward for simple tasks, while Detectron2 offers more flexibility and customization options. MediaPipe is better suited for real-time applications on mobile devices, whereas Detectron2 excels in research and complex computer vision tasks on more powerful hardware.

49,537

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Pros of YOLOv5

  • Faster inference speed and real-time object detection capabilities
  • Extensive documentation and community support
  • Easier to fine-tune and adapt for custom object detection tasks

Cons of YOLOv5

  • Limited to object detection tasks, while MediaPipe offers a broader range of computer vision solutions
  • May require more computational resources for training compared to some MediaPipe models

Code Comparison

YOLOv5:

import torch

model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
results = model('image.jpg')
results.print()

MediaPipe:

import mediapipe as mp

mp_face_detection = mp.solutions.face_detection
with mp_face_detection.FaceDetection(min_detection_confidence=0.5) as face_detection:
    results = face_detection.process(image)

YOLOv5 focuses on object detection with a simple API, while MediaPipe offers a variety of pre-built solutions for different computer vision tasks. YOLOv5 is more suitable for custom object detection projects, while MediaPipe provides ready-to-use models for specific applications like face detection, pose estimation, and hand tracking.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README


layout: forward target: https://developers.google.com/mediapipe title: Home nav_order: 1


Attention: We have moved to https://developers.google.com/mediapipe as the primary developer documentation site for MediaPipe as of April 3, 2023.

MediaPipe

Attention: MediaPipe Solutions Preview is an early release. Learn more.

On-device machine learning for everyone

Delight your customers with innovative machine learning features. MediaPipe contains everything that you need to customize and deploy to mobile (Android, iOS), web, desktop, edge devices, and IoT, effortlessly.

Get started

You can get started with MediaPipe Solutions by by checking out any of the developer guides for vision, text, and audio tasks. If you need help setting up a development environment for use with MediaPipe Tasks, check out the setup guides for Android, web apps, and Python.

Solutions

MediaPipe Solutions provides a suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. You can plug these solutions into your applications immediately, customize them to your needs, and use them across multiple development platforms. MediaPipe Solutions is part of the MediaPipe open source project, so you can further customize the solutions code to meet your application needs.

These libraries and resources provide the core functionality for each MediaPipe Solution:

  • MediaPipe Tasks: Cross-platform APIs and libraries for deploying solutions. Learn more.
  • MediaPipe models: Pre-trained, ready-to-run models for use with each solution.

These tools let you customize and evaluate solutions:

  • MediaPipe Model Maker: Customize models for solutions with your data. Learn more.
  • MediaPipe Studio: Visualize, evaluate, and benchmark solutions in your browser. Learn more.

Legacy solutions

We have ended support for these MediaPipe Legacy Solutions as of March 1, 2023. All other MediaPipe Legacy Solutions will be upgraded to a new MediaPipe Solution. See the Solutions guide for details. The code repository and prebuilt binaries for all MediaPipe Legacy Solutions will continue to be provided on an as-is basis.

For more on the legacy solutions, see the documentation.

Framework

To start using MediaPipe Framework, install MediaPipe Framework and start building example applications in C++, Android, and iOS.

MediaPipe Framework is the low-level component used to build efficient on-device machine learning pipelines, similar to the premade MediaPipe Solutions.

Before using MediaPipe Framework, familiarize yourself with the following key Framework concepts:

Community

  • Slack community for MediaPipe users.
  • Discuss - General community discussion around MediaPipe.
  • Awesome MediaPipe - A curated list of awesome MediaPipe related frameworks, libraries and software.

Contributing

We welcome contributions. Please follow these guidelines.

We use GitHub issues for tracking requests and bugs. Please post questions to the MediaPipe Stack Overflow with a mediapipe tag.

Resources

Publications

Videos