mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

30,292

5,407

30,292

584

View on GitHub

Top Related Projects

mediapipe

30,868

Cross-platform, customizable ML solutions for live and streaming media.

tfjs

18,913

A WebGL accelerated JavaScript library for training and deploying ML models.

opencv

83,339

Open Source Computer Vision Library

openpose

32,828

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

yolov5

54,362

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Quick Overview

MediaPipe is an open-source framework developed by Google for building multimodal (video, audio, and sensor data) applied machine learning pipelines. It provides a set of pre-built solutions for various tasks such as face detection, pose estimation, and hand tracking, which can be easily integrated into mobile, web, and IoT applications.

Pros

Cross-platform compatibility (Android, iOS, web, desktop)
Optimized for real-time performance on edge devices
Extensive collection of pre-built ML solutions
Easy integration with popular ML frameworks like TensorFlow

Cons

Steep learning curve for beginners
Limited customization options for some pre-built solutions
Documentation can be inconsistent or outdated in some areas
Dependency management can be complex for certain platforms

Code Examples

Face detection using MediaPipe:

import cv2
import mediapipe as mp

mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils

with mp_face_detection.FaceDetection(min_detection_confidence=0.5) as face_detection:
    image = cv2.imread('image.jpg')
    results = face_detection.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    
    if results.detections:
        for detection in results.detections:
            mp_drawing.draw_detection(image, detection)
    
    cv2.imshow('Face Detection', image)
    cv2.waitKey(0)

Hand tracking using MediaPipe:

import cv2
import mediapipe as mp

mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils

cap = cv2.VideoCapture(0)
with mp_hands.Hands(min_detection_confidence=0.5, min_tracking_confidence=0.5) as hands:
    while cap.isOpened():
        success, image = cap.read()
        if not success:
            break
        
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = hands.process(image)
        
        if results.multi_hand_landmarks:
            for hand_landmarks in results.multi_hand_landmarks:
                mp_drawing.draw_landmarks(image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
        
        cv2.imshow('Hand Tracking', cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
        if cv2.waitKey(5) & 0xFF == 27:
            break

cap.release()

Pose estimation using MediaPipe:

import cv2
import mediapipe as mp

mp_pose = mp.solutions.pose
mp_drawing = mp.solutions.drawing_utils

with mp_pose.Pose(min_detection_confidence=0.5, min_tracking_confidence=0.5) as pose:
    image = cv2.imread('pose.jpg')
    results = pose.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    
    if results.pose_landmarks:
        mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
    
    cv2.imshow('Pose Estimation', image)
    cv2.waitKey(0)

Getting Started

To get started with MediaPipe, follow these steps:

Install MediaPipe:

pip install mediapipe

Import the necessary modules:

import mediapipe as mp
import cv2

Choose a solution (e.g., face detection, hand tracking, pose estimation) and initialize it:

mp_face_detection = mp.solutions.face_detection
face_detection = mp_face_detection.FaceDetection(min_detection_confidence=0.5)

Process images or video frames using the chosen solution:

image = cv2.imread('image.jpg')
results = face_detection.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

Competitor Comparisons

mediapipe

30,868

Cross-platform, customizable ML solutions for live and streaming media.

Pros of MediaPipe

Comprehensive cross-platform solution for building multimodal ML pipelines
Extensive library of pre-built components for various ML tasks
Optimized for real-time performance on mobile and edge devices

Cons of MediaPipe

Steeper learning curve due to complex architecture
Limited customization options for some pre-built components
Larger binary size compared to more specialized libraries

Code Comparison

MediaPipe example:

import mediapipe as mp

mp_hands = mp.solutions.hands
hands = mp_hands.Hands()
results = hands.process(image)
if results.multi_hand_landmarks:
    # Process hand landmarks

Note: As the comparison is between the same repository (google-ai-edge/mediapipe), there isn't a meaningful code comparison to be made. The code example provided is a typical usage of MediaPipe for hand detection.

Summary

MediaPipe is a powerful and versatile framework for building machine learning pipelines, particularly suited for real-time applications on mobile and edge devices. While it offers a wide range of pre-built components and optimized performance, it may require more effort to learn and integrate compared to simpler, more specialized libraries. The trade-off between its comprehensive features and complexity makes it an excellent choice for projects that require advanced multimodal ML capabilities and cross-platform support.

tfjs

18,913

A WebGL accelerated JavaScript library for training and deploying ML models.

Pros of TensorFlow.js

Runs machine learning models directly in the browser or Node.js
Supports a wide range of pre-trained models and custom model creation
Extensive documentation and community support

Cons of TensorFlow.js

Performance may be slower compared to native implementations
Limited support for mobile devices and edge computing
Larger file size and longer initial load times for web applications

Code Comparison

MediaPipe (C++):

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"

class HandLandmarkCalculator : public CalculatorBase {
 public:
  static ::mediapipe::Status GetContract(CalculatorContract* cc);
};

TensorFlow.js (JavaScript):

import * as tf from '@tensorflow/tfjs';

const model = await tf.loadLayersModel('path/to/model.json');
const prediction = model.predict(tf.tensor2d([[1, 2, 3, 4]]));
console.log(prediction.dataSync());

MediaPipe focuses on efficient, cross-platform ML solutions for mobile and edge devices, while TensorFlow.js prioritizes running ML models in web browsers and JavaScript environments. MediaPipe offers optimized C++ implementations, whereas TensorFlow.js provides a more accessible JavaScript API for web developers.

opencv

83,339

Open Source Computer Vision Library

Pros of OpenCV

Broader scope and functionality, covering a wide range of computer vision tasks
Larger community and more extensive documentation
Better performance for traditional computer vision algorithms

Cons of OpenCV

Steeper learning curve for beginners
Less focus on mobile and edge devices
Heavier resource requirements for some applications

Code Comparison

MediaPipe example (Python):

import mediapipe as mp

mp_hands = mp.solutions.hands
hands = mp_hands.Hands()
results = hands.process(image)
if results.multi_hand_landmarks:
    # Process hand landmarks

OpenCV example (Python):

import cv2

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.1, 4)
for (x, y, w, h) in faces:
    cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)

MediaPipe focuses on specific tasks like hand tracking, while OpenCV provides a more general-purpose toolkit for computer vision. MediaPipe offers easier implementation for certain AI-powered tasks, while OpenCV excels in traditional computer vision algorithms and offers more flexibility for custom solutions.

openpose

32,828

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Pros of OpenPose

Higher accuracy in multi-person pose estimation
More robust in complex scenarios with occlusions
Supports 3D pose estimation

Cons of OpenPose

Slower processing speed, especially on CPU
Higher computational requirements
Less suitable for real-time applications on mobile devices

Code Comparison

OpenPose:

#include <openpose/pose/poseExtractor.hpp>

auto poseExtractor = op::PoseExtractorCaffe::getInstance(poseModel, netInputSize, outputSize, keypointScaleMode, num_gpu_start);
poseExtractor->forwardPass(netInputArray, imageSize, scaleInputToNetInputs);

MediaPipe:

import mediapipe as mp

mp_pose = mp.solutions.pose
pose = mp_pose.Pose(static_image_mode=False, min_detection_confidence=0.5)
results = pose.process(image)

OpenPose offers more detailed configuration options and is implemented in C++, while MediaPipe provides a simpler Python API. OpenPose is better suited for high-accuracy research applications, whereas MediaPipe is more appropriate for real-time, mobile, and web-based applications due to its lighter computational footprint and easier integration.

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

More comprehensive and flexible object detection framework
Better suited for advanced research and custom model development
Extensive documentation and community support

Cons of Detectron2

Steeper learning curve and more complex setup
Higher computational requirements for training and inference
Less optimized for mobile and edge devices

Code Comparison

MediaPipe example:

import mediapipe as mp

mp_face_detection = mp.solutions.face_detection
face_detection = mp_face_detection.FaceDetection()
results = face_detection.process(image)

Detectron2 example:

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

MediaPipe is more straightforward for simple tasks, while Detectron2 offers more flexibility and customization options. MediaPipe is better suited for real-time applications on mobile devices, whereas Detectron2 excels in research and complex computer vision tasks on more powerful hardware.

yolov5

54,362

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Pros of YOLOv5

Faster inference speed and real-time object detection capabilities
Extensive documentation and community support
Easier to fine-tune and adapt for custom object detection tasks

Cons of YOLOv5

Limited to object detection tasks, while MediaPipe offers a broader range of computer vision solutions
May require more computational resources for training compared to some MediaPipe models

Code Comparison

YOLOv5:

import torch

model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
results = model('image.jpg')
results.print()

MediaPipe:

import mediapipe as mp

mp_face_detection = mp.solutions.face_detection
with mp_face_detection.FaceDetection(min_detection_confidence=0.5) as face_detection:
    results = face_detection.process(image)

YOLOv5 focuses on object detection with a simple API, while MediaPipe offers a variety of pre-built solutions for different computer vision tasks. YOLOv5 is more suitable for custom object detection projects, while MediaPipe provides ready-to-use models for specific applications like face detection, pose estimation, and hand tracking.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

layout: forward target: https://developers.google.com/mediapipe title: Home nav_order: 1

Attention: We have moved to https://developers.google.com/mediapipe as the primary developer documentation site for MediaPipe as of April 3, 2023.

MediaPipe

Attention: MediaPipe Solutions Preview is an early release. Learn more.

On-device machine learning for everyone

Delight your customers with innovative machine learning features. MediaPipe contains everything that you need to customize and deploy to mobile (Android, iOS), web, desktop, edge devices, and IoT, effortlessly.

Get started

You can get started with MediaPipe Solutions by by checking out any of the developer guides for vision, text, and audio tasks. If you need help setting up a development environment for use with MediaPipe Tasks, check out the setup guides for Android, web apps, and Python.

Solutions

MediaPipe Solutions provides a suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. You can plug these solutions into your applications immediately, customize them to your needs, and use them across multiple development platforms. MediaPipe Solutions is part of the MediaPipe open source project, so you can further customize the solutions code to meet your application needs.

These libraries and resources provide the core functionality for each MediaPipe Solution:

MediaPipe Tasks: Cross-platform APIs and libraries for deploying solutions. Learn more.
MediaPipe models: Pre-trained, ready-to-run models for use with each solution.

These tools let you customize and evaluate solutions:

MediaPipe Model Maker: Customize models for solutions with your data. Learn more.
MediaPipe Studio: Visualize, evaluate, and benchmark solutions in your browser. Learn more.

Legacy solutions

We have ended support for these MediaPipe Legacy Solutions as of March 1, 2023. All other MediaPipe Legacy Solutions will be upgraded to a new MediaPipe Solution. See the Solutions guide for details. The code repository and prebuilt binaries for all MediaPipe Legacy Solutions will continue to be provided on an as-is basis.

For more on the legacy solutions, see the documentation.

Framework

To start using MediaPipe Framework, install MediaPipe Framework and start building example applications in C++, Android, and iOS.

MediaPipe Framework is the low-level component used to build efficient on-device machine learning pipelines, similar to the premade MediaPipe Solutions.

Before using MediaPipe Framework, familiarize yourself with the following key Framework concepts:

Community

Slack community for MediaPipe users.
Discuss - General community discussion around MediaPipe.
Awesome MediaPipe - A curated list of awesome MediaPipe related frameworks, libraries and software.

Contributing

We welcome contributions. Please follow these guidelines.

We use GitHub issues for tracking requests and bugs. Please post questions to the MediaPipe Stack Overflow with a mediapipe tag.

Resources

Publications

Bringing artworks to life with AR in Google Developers Blog
Prosthesis control via Mirru App using MediaPipe hand tracking in Google Developers Blog
SignAll SDK: Sign language interface using MediaPipe is now available for developers in Google Developers Blog
MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on Device in Google AI Blog
Background Features in Google Meet, Powered by Web ML in Google AI Blog
MediaPipe 3D Face Transform in Google Developers Blog
Instant Motion Tracking With MediaPipe in Google Developers Blog
BlazePose - On-device Real-time Body Pose Tracking in Google AI Blog
MediaPipe Iris: Real-time Eye Tracking and Depth Estimation in Google AI Blog
MediaPipe KNIFT: Template-based feature matching in Google Developers Blog
Alfred Camera: Smart camera features using MediaPipe in Google Developers Blog
Real-Time 3D Object Detection on Mobile Devices with MediaPipe in Google AI Blog
AutoFlip: An Open Source Framework for Intelligent Video Reframing in Google AI Blog
MediaPipe on the Web in Google Developers Blog
Object Detection and Tracking using MediaPipe in Google Developers Blog
On-Device, Real-Time Hand Tracking with MediaPipe in Google AI Blog
MediaPipe: A Framework for Building Perception Pipelines

Videos

YouTube Channel

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot