MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/README.md). MNN TaoAvatar Android - Local 3D Avatar Intelligence: apps/Android/Mnn3dAvatar/README.md

11,962

1,940

11,962

100

View on GitHub

Top Related Projects

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

onnx

19,372

Open standard for machine learning interoperability

coremltools

4,869

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

onnxruntime

17,390

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ncnn

21,857

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Quick Overview

MNN (Mobile Neural Network) is a lightweight deep learning framework developed by Alibaba. It's designed for efficient inference on mobile devices and embedded systems, supporting various neural network architectures and optimized for cross-platform performance.

Pros

High performance and low memory footprint, ideal for mobile and embedded devices
Cross-platform support (iOS, Android, Linux, Windows, macOS)
Supports multiple deep learning frameworks (TensorFlow, PyTorch, ONNX, etc.)
Provides quantization and model compression techniques for further optimization

Cons

Limited documentation and examples compared to more established frameworks
Smaller community and ecosystem compared to TensorFlow or PyTorch
Primarily focused on inference, not training
May require additional effort to integrate with custom or less common model architectures

Code Examples

Loading and running a model:

#include <MNN/Interpreter.hpp>

auto interpreter = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("model.mnn"));
auto session = interpreter->createSession({});
interpreter->runSession(session);

Accessing input and output tensors:

auto input = interpreter->getSessionInput(session, nullptr);
auto output = interpreter->getSessionOutput(session, nullptr);

// Fill input tensor with data
float* inputData = input->host<float>();
// ... fill inputData with your input

interpreter->runSession(session);

// Access output data
float* outputData = output->host<float>();
// ... use outputData for further processing

Quantizing a model:

#include <MNN/Interpreter.hpp>
#include <MNN/Converter.hpp>

std::unique_ptr<MNN::NetT> net = std::move(MNN::Converter::load("model.mnn"));
MNN::QuantizeParams params;
params.quant_bits = 8;
MNN::Converter::quantize(net.get(), params);
MNN::Converter::save(net.get(), "quantized_model.mnn");

Getting Started

Clone the repository:

git clone https://github.com/alibaba/MNN.git

Build MNN:

cd MNN
./schema/generate.sh
./tools/script/get_model.sh
mkdir build && cd build
cmake .. && make -j4

Run the example:

./build/express/temp/testModel.out models/mobilenet_v1.mnn

This will run a simple inference using a pre-trained MobileNet model. For more detailed instructions and advanced usage, refer to the official MNN documentation.

Competitor Comparisons

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Larger ecosystem and community support
More comprehensive documentation and tutorials
Wider range of pre-trained models and tools

Cons of TensorFlow

Heavier and more resource-intensive
Steeper learning curve for beginners
Slower inference speed on mobile devices

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

MNN:

#include <MNN/Interpreter.hpp>

auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("model.mnn"));
auto session = net->createSession(config);
net->runSession(session);

The code snippets demonstrate the basic model creation and execution in both frameworks. TensorFlow uses a high-level Python API, while MNN employs a C++ interface for model loading and inference.

TensorFlow offers a more intuitive and flexible approach to building models, especially for researchers and data scientists. MNN, on the other hand, focuses on efficient model deployment and execution, particularly on mobile and embedded devices.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Larger community and ecosystem, with more resources and third-party libraries
More flexible and dynamic computational graph, allowing for easier debugging
Better support for research and prototyping in deep learning

Cons of PyTorch

Generally slower inference speed compared to MNN
Larger model size and memory footprint
Less optimized for mobile and edge devices

Code Comparison

PyTorch example:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.add(x, y)

MNN example:

#include <MNN/Interpreter.hpp>

auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("model.mnn"));
net->runSession(session);

The PyTorch example demonstrates its Python-based API and dynamic tensor operations, while the MNN example shows its C++ interface and focus on model inference. PyTorch offers a more intuitive and flexible approach for model development, whereas MNN is designed for efficient deployment and inference on various platforms, especially mobile devices.

onnx

19,372

Open standard for machine learning interoperability

Pros of ONNX

Wider industry adoption and support from major AI/ML frameworks
More comprehensive model representation, supporting a broader range of operations
Better interoperability between different deep learning frameworks

Cons of ONNX

Can be more complex to use and implement
Larger file sizes for model representations
May have slower inference speed compared to MNN's optimized runtime

Code Comparison

MNN example:

auto input = _Input({1, 3, 224, 224}, NC4HW4);
auto conv = _Conv(3, 16, {3, 3}, SAME, input);
auto output = _Softmax(conv);

ONNX example:

input = helper.make_tensor_value_info('input', TensorProto.FLOAT, [1, 3, 224, 224])
conv = helper.make_node('Conv', ['input', 'weight', 'bias'], ['conv_output'])
output = helper.make_node('Softmax', ['conv_output'], ['output'])

Both examples show basic model construction, but ONNX requires more verbose code to define nodes and tensors. MNN's API is more concise and intuitive for direct model building. However, ONNX's verbosity allows for more detailed control over model structure and attributes.

coremltools

4,869

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Pros of Core ML Tools

Seamless integration with Apple's ecosystem and iOS/macOS devices
Supports a wide range of popular machine learning frameworks (TensorFlow, PyTorch, scikit-learn)
Extensive documentation and active community support

Cons of Core ML Tools

Limited to Apple platforms, reducing cross-platform compatibility
May require more computational resources for model conversion and optimization
Less flexibility for custom optimizations compared to MNN

Code Comparison

Core ML Tools:

import coremltools as ct

model = ct.convert('model.h5', 
                   source='keras',
                   convert_to='mlprogram',
                   compute_units='ALL')
model.save('converted_model.mlpackage')

MNN:

#include <MNN/Interpreter.hpp>

auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("model.mnn"));
net->setSessionMode(MNN::Interpreter::Session_Release);
auto session = net->createSession(config);
net->runSession(session);

Core ML Tools focuses on converting models to Apple's Core ML format, while MNN provides a lightweight inference engine for direct model execution across multiple platforms. Core ML Tools offers easier integration with Apple devices, but MNN provides more flexibility and cross-platform support.

onnxruntime

17,390

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

Broader ecosystem support and compatibility with various ML frameworks
More extensive documentation and community resources
Better performance optimization for a wider range of hardware platforms

Cons of ONNX Runtime

Larger binary size and potentially higher memory footprint
Steeper learning curve for beginners due to more complex API

Code Comparison

MNN example:

auto interpreter = std::shared_ptr<Interpreter>(Interpreter::createFromFile(modelPath));
interpreter->runSession(session);
auto output = interpreter->getSessionOutput(session, nullptr);

ONNX Runtime example:

Ort::Session session(env, model_path, session_options);
std::vector<float> input_tensor_values(input_tensor_size);
Ort::Value input_tensor = Ort::Value::CreateTensor<float>(memory_info, input_tensor_values.data(), input_tensor_size, input_node_dims.data(), 4);
auto output_tensors = session.Run(Ort::RunOptions{nullptr}, input_node_names.data(), &input_tensor, 1, output_node_names.data(), 1);

Both MNN and ONNX Runtime are powerful inference engines for deploying machine learning models. MNN, developed by Alibaba, focuses on mobile and embedded devices, offering a lightweight solution with fast inference speeds. ONNX Runtime, created by Microsoft, provides a more versatile platform supporting a wider range of hardware and frameworks. While MNN excels in mobile scenarios, ONNX Runtime offers broader compatibility and optimization options across various deployment environments.

ncnn

21,857

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Pros of ncnn

Smaller binary size and lower memory footprint
Better support for quantization and fixed-point operations
More extensive documentation and community support

Cons of ncnn

Less support for newer neural network architectures
Slower inference speed on some devices compared to MNN
More limited platform support, especially for mobile devices

Code Comparison

MNN example:

auto input = _Input({1, 3, 224, 224}, NC4HW4);
auto conv = _Conv(3, 16, {3, 3}, VALID);
auto output = conv(input);

ncnn example:

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("output", out);

Both libraries offer concise APIs for model inference, but MNN's API is more declarative and allows for easier model construction, while ncnn focuses on loading pre-trained models and performing inference.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

News ð¥

[2025/06/11] New App MNN-TaoAvatar released, you can talk with 3DAvatar offline with LLM, ASR, TTS, A2BS and NNR models all run local on your device!! MNN-TaoAvatar

[2025/05/30] MNN Chat app support DeepSeek-R1-0528-Qwen3,Qwen3-30B-A3B, SmoVLM and FastVLM MNN Chat App.
[2025/05/12] android app support qwen2.5 omni 3b and 7b MNN Chat App.

Icon Icon Icon

History News

[2025/04/30] android app support qwen3 and dark mode MNN Chat App.

Icon

[2025/02/18] iOS multimodal LLM App is released MNN LLM iOS.

Icon

[2025/02/11] android app support for deepseek r1 1.5b.

Icon

[2025/01/23] We released our full multimodal LLM Android App:MNN-LLM-Android. including text-to-text, image-to-text, audio-to-text, and text-to-image generation.

Icon Icon Icon Icon

Intro

MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models and has industry-leading performance for inference and training on-device. At present, MNN has been integrated into more than 30 apps of Alibaba Inc, such as Taobao, Tmall, Youku, DingTalk, Xianyu, etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control. In addition, MNN is also used on embedded devices, such as IoT.

MNN-LLM is a large language model runtime solution developed based on the MNN engine. The mission of this project is to deploy LLM models locally on everyone's platforms(Mobile Phone/PC/IOT). It supports popular large language models such as Qianwen, Baichuan, Zhipu, LLAMA, and others. MNN-LLM User guide

MNN-Diffusion is a stable diffusion model runtime solution developed based on the MNN engine. The mission of this project is to deploy stable diffusion models locally on everyone's platforms. MNN-Diffusion User guide

architecture

Inside Alibaba, MNN works as the basic module of the compute container in the Walle System, the first end-to-end, general-purpose, and large-scale production system for device-cloud collaborative machine learning, which has been published in the top system conference OSDIâ22. The key design principles of MNN and the extensive benchmark testing results (vs. TensorFlow, TensorFlow Lite, PyTorch, PyTorch Mobile, TVM) can be found in the OSDI paper. The scripts and instructions for benchmark testing are put in the path â/benchmarkâ. If MNN or the design of Walle helps your research or production use, please cite our OSDI paper as follows:

@inproceedings {proc:osdi22:walle,
    author = {Chengfei Lv and Chaoyue Niu and Renjie Gu and Xiaotang Jiang and Zhaode Wang and Bin Liu and Ziqi Wu and Qiulin Yao and Congyu Huang and Panos Huang and Tao Huang and Hui Shu and Jinde Song and Bin Zou and Peng Lan and Guohuan Xu and Fei Wu and Shaojie Tang and Fan Wu and Guihai Chen},
    title = {Walle: An {End-to-End}, {General-Purpose}, and {Large-Scale} Production System for {Device-Cloud} Collaborative Machine Learning},
    booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
    year = {2022},
    isbn = {978-1-939133-28-1},
    address = {Carlsbad, CA},
    pages = {249--265},
    url = {https://www.usenix.org/conference/osdi22/presentation/lv},
    publisher = {USENIX Association},
    month = jul,
}

Documentation and Workbench

MNN's docs are in place in Read the docs.

You can also read docs/README to build docs's html.

MNN Workbench could be downloaded from MNN's homepage, which provides pretrained models, visualized training tools, and one-click deployment of models to devices.

Key Features

Lightweight

Optimized for devices, no dependencies, can be easily deployed to mobile devices and a variety of embedded devices.
iOS platform: static library size will full option for armv7+arm64 platforms is about 12MB, size increase of linked executables is about 2M.
Android platform: core so size is about 800KB (armv7a - c++_shared).
Using MNN_BUILD_MINI can reduce package size by about 25%, with a limit of fixed model input size
Support FP16 / Int8 quantize, can reduce model size 50%-70%

Versatility

Supports Tensorflow, Caffe, ONNX,Torchscripts and supports common neural networks such as CNN, RNN, GAN, Transformer.
Supports AI model with multi-inputs or multi-outputs, every kind of dimension format, dynamic inputs, controlflow.
MNN supports approximate full OPs used for the AI Model. The converter supports 178 Tensorflow OPs, 52 Caffe OPs, 163 Torchscripts OPs, 158 ONNX OPs.
Supports iOS 8.0+, Android 4.3+, and embedded devices with POSIX interface.
Supports hybrid computing on multiple devices. Currently supports CPU and GPU.

High performance

Implements core computing with lots of optimized assembly code to make full use of the ARM / x64 CPU.
Use Metal / OpenCL / Vulkan to support GPU inference on mobile.
Use CUDA and tensorcore to support NVIDIA GPU for better performance
Convolution and transposition convolution algorithms are efficient and stable. The Winograd convolution algorithm is widely used to better symmetric convolutions such as 3x3,4x4,5x5,6x6,7x7.
Twice speed increase for the new architecture ARM v8.2 with FP16 half-precision calculation support. 2.5 faster to use sdot for ARM v8.2 and VNNI.

Ease of use

Support use MNN's OP to do numerical calculating like numpy.
Support lightweight image process module like OpenCV, which is only 100k.
Support build model and train it on PC / mobile.
MNN Python API helps ML engineers to easily use MNN to infer, train, and process images, without dipping their toes in C++ code.

The Architecture / Precision MNN supported is shown below:

S ï¼Support and work well, deeply optimized, recommend to use
A ï¼Support and work well, can use
B ï¼Support but has bug or not optimized, no recommend to use
C ï¼Not Support

Architecture / Precision		Normal	FP16	BF16	Int8
CPU	Native	B	C	B	B
	x86/x64-SSE4.1	A	C	C	A
	x86/x64-AVX2	S	C	C	A
	x86/x64-AVX512	S	C	C	S
	ARMv7a	S	S (ARMv8.2)	S	S
	ARMv8	S	S (ARMv8.2)	S(ARMv8.6)	S
GPU	OpenCL	A	S	C	S
	Vulkan	A	A	C	A
	Metal	A	S	C	S
	CUDA	A	S	C	A
NPU	CoreML	A	C	C	C
	HIAI	A	C	C	C
	NNAPI	B	B	C	B
	QNN	C	B	C	C

Tools

Base on MNN (Tensor compute engine), we provided a series of tools for inference, train and general computation.

MNN-Converter: Convert other models to MNN models for inference, such as Tensorflow(lite), Caffe, ONNX, Torchscripts. And do graph optimization to reduce computation.
MNN-Compress: Compress model to reduce size and increase performance / speed
MNN-Express: Support model with controlflow, use MNN's OP to do general-purpose computing.
MNN-CV: An OpenCV-like library, but based on MNN and then much more lightweight.
MNN-Train: Support train MNN model.

How to Discuss and Get Help From the MNN Community

The group discussions are predominantly Chinese. But we welcome and will help English speakers.

Dingtalk discussion groups:

Group #1 (Full): 23329087

Group #2 (Full): 23350225

Group #3: QR code:

MNN-3

Historical Paper

The preliminary version of MNN, as mobile inference engine and with the focus on manual optimization, has also been published in MLSys 2020. Please cite the paper, if MNN previously helped your research:

@inproceedings{alibaba2020mnn,
  author = {Jiang, Xiaotang and Wang, Huan and Chen, Yiliu and Wu, Ziqi and Wang, Lichuan and Zou, Bin and Yang, Yafeng and Cui, Zongyang and Cai, Yu and Yu, Tianhang and Lv, Chengfei and Wu, Zhihua},
  title = {MNN: A Universal and Efficient Inference Engine},
  booktitle = {MLSys},
  year = {2020}
}

License

Apache 2.0

Acknowledgement

MNN participants: Taobao Technology Department, Search Engineering Team, DAMO Team, Youku and other Alibaba Group employees.

MNN refers to the following projects:

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of TensorFlow

Cons of TensorFlow

Code Comparison

Pros of PyTorch

Cons of PyTorch

Code Comparison

Pros of ONNX

Cons of ONNX

Code Comparison

Pros of Core ML Tools

Cons of Core ML Tools

Code Comparison

Pros of ONNX Runtime

Cons of ONNX Runtime

Code Comparison

Pros of ncnn

Cons of ncnn

Code Comparison

Convert designs to code with AI

README

News ð¥

Intro

Documentation and Workbench

Key Features

Lightweight

Versatility

High performance

Ease of use

Tools

How to Discuss and Get Help From the MNN Community

Historical Paper

License

Acknowledgement

Top Related Projects

Convert designs to code with AI

News ð¥