MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
Top Related Projects
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Open standard for machine learning interoperability
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Quick Overview
MNN (Mobile Neural Network) is a lightweight deep learning framework developed by Alibaba. It's designed for efficient inference on mobile devices and embedded systems, supporting various neural network architectures and optimized for cross-platform performance.
Pros
- High performance and low memory footprint, ideal for mobile and embedded devices
- Cross-platform support (iOS, Android, Linux, Windows, macOS)
- Supports multiple deep learning frameworks (TensorFlow, PyTorch, ONNX, etc.)
- Provides quantization and model compression techniques for further optimization
Cons
- Limited documentation and examples compared to more established frameworks
- Smaller community and ecosystem compared to TensorFlow or PyTorch
- Primarily focused on inference, not training
- May require additional effort to integrate with custom or less common model architectures
Code Examples
- Loading and running a model:
#include <MNN/Interpreter.hpp>
auto interpreter = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("model.mnn"));
auto session = interpreter->createSession({});
interpreter->runSession(session);
- Accessing input and output tensors:
auto input = interpreter->getSessionInput(session, nullptr);
auto output = interpreter->getSessionOutput(session, nullptr);
// Fill input tensor with data
float* inputData = input->host<float>();
// ... fill inputData with your input
interpreter->runSession(session);
// Access output data
float* outputData = output->host<float>();
// ... use outputData for further processing
- Quantizing a model:
#include <MNN/Interpreter.hpp>
#include <MNN/Converter.hpp>
std::unique_ptr<MNN::NetT> net = std::move(MNN::Converter::load("model.mnn"));
MNN::QuantizeParams params;
params.quant_bits = 8;
MNN::Converter::quantize(net.get(), params);
MNN::Converter::save(net.get(), "quantized_model.mnn");
Getting Started
-
Clone the repository:
git clone https://github.com/alibaba/MNN.git
-
Build MNN:
cd MNN ./schema/generate.sh ./tools/script/get_model.sh mkdir build && cd build cmake .. && make -j4
-
Run the example:
./build/express/temp/testModel.out models/mobilenet_v1.mnn
This will run a simple inference using a pre-trained MobileNet model. For more detailed instructions and advanced usage, refer to the official MNN documentation.
Competitor Comparisons
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Larger ecosystem and community support
- More comprehensive documentation and tutorials
- Wider range of pre-trained models and tools
Cons of TensorFlow
- Heavier and more resource-intensive
- Steeper learning curve for beginners
- Slower inference speed on mobile devices
Code Comparison
TensorFlow:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
MNN:
#include <MNN/Interpreter.hpp>
auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("model.mnn"));
auto session = net->createSession(config);
net->runSession(session);
The code snippets demonstrate the basic model creation and execution in both frameworks. TensorFlow uses a high-level Python API, while MNN employs a C++ interface for model loading and inference.
TensorFlow offers a more intuitive and flexible approach to building models, especially for researchers and data scientists. MNN, on the other hand, focuses on efficient model deployment and execution, particularly on mobile and embedded devices.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Larger community and ecosystem, with more resources and third-party libraries
- More flexible and dynamic computational graph, allowing for easier debugging
- Better support for research and prototyping in deep learning
Cons of PyTorch
- Generally slower inference speed compared to MNN
- Larger model size and memory footprint
- Less optimized for mobile and edge devices
Code Comparison
PyTorch example:
import torch
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.add(x, y)
MNN example:
#include <MNN/Interpreter.hpp>
auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("model.mnn"));
net->runSession(session);
The PyTorch example demonstrates its Python-based API and dynamic tensor operations, while the MNN example shows its C++ interface and focus on model inference. PyTorch offers a more intuitive and flexible approach for model development, whereas MNN is designed for efficient deployment and inference on various platforms, especially mobile devices.
Open standard for machine learning interoperability
Pros of ONNX
- Wider industry adoption and support from major AI/ML frameworks
- More comprehensive model representation, supporting a broader range of operations
- Better interoperability between different deep learning frameworks
Cons of ONNX
- Can be more complex to use and implement
- Larger file sizes for model representations
- May have slower inference speed compared to MNN's optimized runtime
Code Comparison
MNN example:
auto input = _Input({1, 3, 224, 224}, NC4HW4);
auto conv = _Conv(3, 16, {3, 3}, SAME, input);
auto output = _Softmax(conv);
ONNX example:
input = helper.make_tensor_value_info('input', TensorProto.FLOAT, [1, 3, 224, 224])
conv = helper.make_node('Conv', ['input', 'weight', 'bias'], ['conv_output'])
output = helper.make_node('Softmax', ['conv_output'], ['output'])
Both examples show basic model construction, but ONNX requires more verbose code to define nodes and tensors. MNN's API is more concise and intuitive for direct model building. However, ONNX's verbosity allows for more detailed control over model structure and attributes.
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Pros of Core ML Tools
- Seamless integration with Apple's ecosystem and iOS/macOS devices
- Supports a wide range of popular machine learning frameworks (TensorFlow, PyTorch, scikit-learn)
- Extensive documentation and active community support
Cons of Core ML Tools
- Limited to Apple platforms, reducing cross-platform compatibility
- May require more computational resources for model conversion and optimization
- Less flexibility for custom optimizations compared to MNN
Code Comparison
Core ML Tools:
import coremltools as ct
model = ct.convert('model.h5',
source='keras',
convert_to='mlprogram',
compute_units='ALL')
model.save('converted_model.mlpackage')
MNN:
#include <MNN/Interpreter.hpp>
auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile("model.mnn"));
net->setSessionMode(MNN::Interpreter::Session_Release);
auto session = net->createSession(config);
net->runSession(session);
Core ML Tools focuses on converting models to Apple's Core ML format, while MNN provides a lightweight inference engine for direct model execution across multiple platforms. Core ML Tools offers easier integration with Apple devices, but MNN provides more flexibility and cross-platform support.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Pros of ONNX Runtime
- Broader ecosystem support and compatibility with various ML frameworks
- More extensive documentation and community resources
- Better performance optimization for a wider range of hardware platforms
Cons of ONNX Runtime
- Larger binary size and potentially higher memory footprint
- Steeper learning curve for beginners due to more complex API
Code Comparison
MNN example:
auto interpreter = std::shared_ptr<Interpreter>(Interpreter::createFromFile(modelPath));
interpreter->runSession(session);
auto output = interpreter->getSessionOutput(session, nullptr);
ONNX Runtime example:
Ort::Session session(env, model_path, session_options);
std::vector<float> input_tensor_values(input_tensor_size);
Ort::Value input_tensor = Ort::Value::CreateTensor<float>(memory_info, input_tensor_values.data(), input_tensor_size, input_node_dims.data(), 4);
auto output_tensors = session.Run(Ort::RunOptions{nullptr}, input_node_names.data(), &input_tensor, 1, output_node_names.data(), 1);
Both MNN and ONNX Runtime are powerful inference engines for deploying machine learning models. MNN, developed by Alibaba, focuses on mobile and embedded devices, offering a lightweight solution with fast inference speeds. ONNX Runtime, created by Microsoft, provides a more versatile platform supporting a wider range of hardware and frameworks. While MNN excels in mobile scenarios, ONNX Runtime offers broader compatibility and optimization options across various deployment environments.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Pros of ncnn
- Smaller binary size and lower memory footprint
- Better support for quantization and fixed-point operations
- More extensive documentation and community support
Cons of ncnn
- Less support for newer neural network architectures
- Slower inference speed on some devices compared to MNN
- More limited platform support, especially for mobile devices
Code Comparison
MNN example:
auto input = _Input({1, 3, 224, 224}, NC4HW4);
auto conv = _Conv(3, 16, {3, 3}, VALID);
auto output = conv(input);
ncnn example:
ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("output", out);
Both libraries offer concise APIs for model inference, but MNN's API is more declarative and allows for easier model construction, while ncnn focuses on loading pre-trained models and performing inference.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Intro
MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models and has industry-leading performance for inference and training on-device. At present, MNN has been integrated into more than 30 apps of Alibaba Inc, such as Taobao, Tmall, Youku, DingTalk, Xianyu, etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control. In addition, MNN is also used on embedded devices, such as IoT.
MNN-LLM is a large language model runtime solution developed based on the MNN engine. The mission of this project is to deploy LLM models locally on everyone's platforms(Mobile Phone/PC/IOT). It supports popular large language models such as Qianwen, Baichuan, Zhipu, LLAMA, and others. MNN-LLM User guide
MNN-Diffusion is a stable diffusion model runtime solution developed based on the MNN engine. The mission of this project is to deploy stable diffusion models locally on everyone's platforms. MNN-Diffusion User guide
Inside Alibaba, MNN works as the basic module of the compute container in the Walle System, the first end-to-end, general-purpose, and large-scale production system for device-cloud collaborative machine learning, which has been published in the top system conference OSDIâ22. The key design principles of MNN and the extensive benchmark testing results (vs. TensorFlow, TensorFlow Lite, PyTorch, PyTorch Mobile, TVM) can be found in the OSDI paper. The scripts and instructions for benchmark testing are put in the path â/benchmarkâ. If MNN or the design of Walle helps your research or production use, please cite our OSDI paper as follows:
@inproceedings {proc:osdi22:walle,
author = {Chengfei Lv and Chaoyue Niu and Renjie Gu and Xiaotang Jiang and Zhaode Wang and Bin Liu and Ziqi Wu and Qiulin Yao and Congyu Huang and Panos Huang and Tao Huang and Hui Shu and Jinde Song and Bin Zou and Peng Lan and Guohuan Xu and Fei Wu and Shaojie Tang and Fan Wu and Guihai Chen},
title = {Walle: An {End-to-End}, {General-Purpose}, and {Large-Scale} Production System for {Device-Cloud} Collaborative Machine Learning},
booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
year = {2022},
isbn = {978-1-939133-28-1},
address = {Carlsbad, CA},
pages = {249--265},
url = {https://www.usenix.org/conference/osdi22/presentation/lv},
publisher = {USENIX Association},
month = jul,
}
Documentation and Workbench
MNN's docs are in place in Read the docs.
You can also read docs/README to build docs's html.
MNN Workbench could be downloaded from MNN's homepage, which provides pretrained models, visualized training tools, and one-click deployment of models to devices.
Key Features
Lightweight
- Optimized for devices, no dependencies, can be easily deployed to mobile devices and a variety of embedded devices.
- iOS platform: static library size will full option for armv7+arm64 platforms is about 12MB, size increase of linked executables is about 2M.
- Android platform: core so size is about 800KB (armv7a - c++_shared).
- Using MNN_BUILD_MINI can reduce package size by about 25%, with a limit of fixed model input size
- Support FP16 / Int8 quantize, can reduce model size 50%-70%
Versatility
- Supports
Tensorflow
,Caffe
,ONNX
,Torchscripts
and supports common neural networks such asCNN
,RNN
,GAN
,Transformer
. - Supports AI model with multi-inputs or multi-outputs, every kind of dimension format, dynamic inputs, controlflow.
- MNN supports approximate full OPs used for the AI Model. The converter supports 178
Tensorflow
OPs, 52Caffe
OPs, 163Torchscripts
OPs, 158ONNX
OPs. - Supports iOS 8.0+, Android 4.3+, and embedded devices with POSIX interface.
- Supports hybrid computing on multiple devices. Currently supports CPU and GPU.
High performance
- Implements core computing with lots of optimized assembly code to make full use of the ARM / x64 CPU.
- Use Metal / OpenCL / Vulkan to support GPU inference on mobile.
- Use CUDA and tensorcore to support NVIDIA GPU for better performance
- Convolution and transposition convolution algorithms are efficient and stable. The Winograd convolution algorithm is widely used to better symmetric convolutions such as 3x3,4x4,5x5,6x6,7x7.
- Twice speed increase for the new architecture ARM v8.2 with FP16 half-precision calculation support. 2.5 faster to use sdot for ARM v8.2 and VNNI.
Ease of use
- Support use MNN's OP to do numerical calculating like numpy.
- Support lightweight image process module like OpenCV, which is only 100k.
- Support build model and train it on PC / mobile.
- MNN Python API helps ML engineers to easily use MNN to infer, train, and process images, without dipping their toes in C++ code.
The Architecture / Precision MNN supported is shown below:
- S ï¼Support and work well, deeply optimized, recommend to use
- A ï¼Support and work well, can use
- B ï¼Support but has bug or not optimized, no recommend to use
- C ï¼Not Support
Architecture / Precision | Normal | FP16 | BF16 | Int8 | |
---|---|---|---|---|---|
CPU | Native | B | C | B | B |
x86/x64-SSE4.1 | A | B | B | A | |
x86/x64-AVX2 | S | B | B | A | |
x86/x64-AVX512 | S | B | B | S | |
ARMv7a | S | S (ARMv8.2) | S | S | |
ARMv8 | S | S (ARMv8.2) | S(ARMv8.6) | S | |
GPU | OpenCL | A | S | C | C |
Vulkan | A | A | C | C | |
Metal | A | S | C | C | |
CUDA | A | S | C | C | |
NPU | CoreML | B | B | C | C |
HIAI | B | C | C | B | |
NNAPI | B | B | C | C |
Tools
Base on MNN (Tensor compute engine), we provided a series of tools for inference, train and general computation.
- MNN-Converter: Convert other models to MNN models for inference, such as Tensorflow(lite), Caffe, ONNX, Torchscripts. And do graph optimization to reduce computation.
- MNN-Compress: Compress model to reduce size and increase performance / speed
- MNN-Express: Support model with controlflow, use MNN's OP to do general-purpose computing.
- MNN-CV: An OpenCV-like library, but based on MNN and then much more lightweight.
- MNN-Train: Support train MNN model.
How to Discuss and Get Help From the MNN Community
The group discussions are predominantly Chinese. But we welcome and will help English speakers.
Dingtalk discussion groups:
Group #1 (Full): 23329087
Group #2 (Full): 23350225
Group #3: QR code:
Historical Paper
The preliminary version of MNN, as mobile inference engine and with the focus on manual optimization, has also been published in MLSys 2020. Please cite the paper, if MNN previously helped your research:
@inproceedings{alibaba2020mnn,
author = {Jiang, Xiaotang and Wang, Huan and Chen, Yiliu and Wu, Ziqi and Wang, Lichuan and Zou, Bin and Yang, Yafeng and Cui, Zongyang and Cai, Yu and Yu, Tianhang and Lv, Chengfei and Wu, Zhihua},
title = {MNN: A Universal and Efficient Inference Engine},
booktitle = {MLSys},
year = {2020}
}
License
Apache 2.0
Acknowledgement
MNN participants: Taobao Technology Department, Search Engineering Team, DAMO Team, Youku and other Alibaba Group employees.
MNN refers to the following projects:
Top Related Projects
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Open standard for machine learning interoperability
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot