ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Top Related Projects
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
Tengine is a lite, high performance, modular inference engine for embedded device
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Quick Overview
ncnn is a high-performance neural network inference framework optimized for mobile platforms. Developed by Tencent, it is designed to run deep learning models on mobile devices efficiently, with a focus on speed and low memory footprint.
Pros
- Extremely fast inference speed on mobile devices
- Low memory footprint, suitable for resource-constrained environments
- Cross-platform support (Android, iOS, Windows, Linux, macOS)
- Supports a wide range of deep learning models and operations
Cons
- Steeper learning curve compared to some other inference frameworks
- Limited documentation and examples, especially for beginners
- Requires manual model conversion for some popular deep learning frameworks
- Smaller community compared to more mainstream frameworks like TensorFlow Lite
Code Examples
- Loading and running a model:
#include "net.h"
ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in(224, 224, 3);
// Fill 'in' with input data
ncnn::Mat out;
ncnn::Extractor ex = net.create_extractor();
ex.input("data", in);
ex.extract("output", out);
- Creating a custom layer:
class MyCustomLayer : public ncnn::Layer
{
public:
MyCustomLayer()
{
one_blob_only = true;
}
virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const
{
// Implement your custom layer logic here
return 0;
}
};
DEFINE_LAYER_CREATOR(MyCustomLayer)
- Using Vulkan compute:
ncnn::create_gpu_instance();
ncnn::VulkanDevice* vkdev = ncnn::get_gpu_device();
ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.set_vulkan_device(vkdev);
// Load and run model as usual
ncnn::destroy_gpu_instance();
Getting Started
-
Clone the repository:
git clone https://github.com/Tencent/ncnn.git
-
Build the project:
cd ncnn mkdir build && cd build cmake .. make
-
Include ncnn in your project:
#include "net.h"
-
Link against the built library and include the necessary headers in your project's build configuration.
Competitor Comparisons
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
Pros of MNN
- Supports a wider range of platforms, including iOS, Android, Windows, Linux, and macOS
- Offers more comprehensive model conversion tools, supporting various deep learning frameworks
- Provides a higher-level API for easier integration and usage
Cons of MNN
- Generally slower performance compared to ncnn, especially on mobile devices
- Larger binary size, which may impact app size more significantly
- Less focus on minimalism and lightweight design
Code Comparison
MNN example:
auto interpreter = std::shared_ptr<Interpreter>(Interpreter::createFromFile(modelPath));
auto session = interpreter->createSession();
auto input = interpreter->getSessionInput(session, nullptr);
interpreter->runSession(session);
ncnn example:
ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("output", out);
Both libraries aim to provide efficient neural network inference on mobile and embedded devices. ncnn focuses on minimalism and performance, particularly excelling on mobile platforms. MNN offers broader platform support and more comprehensive tools but may sacrifice some performance for flexibility. The choice between them depends on specific project requirements, target platforms, and performance needs.
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
Pros of mace
- Supports a wider range of deep learning frameworks, including TensorFlow, Caffe, and ONNX
- Provides comprehensive performance optimization for various mobile platforms
- Offers a user-friendly command-line interface for model conversion and deployment
Cons of mace
- Larger library size compared to ncnn
- Steeper learning curve for beginners due to more complex architecture
- Less frequent updates and community contributions
Code Comparison
mace:
MaceEngine mace_engine(device_type);
mace_engine.Init(net_def, input_nodes, output_nodes, device_context);
mace_engine.Run(inputs, &outputs);
ncnn:
ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Extractor ex = net.create_extractor();
ex.input("input", in);
ex.extract("output", out);
Both libraries provide straightforward APIs for loading and running models, but mace's interface is slightly more verbose. ncnn's approach is more compact and may be easier for quick implementations. However, mace's structure allows for more flexibility in specifying device types and contexts, which can be beneficial for complex deployment scenarios.
Tengine is a lite, high performance, modular inference engine for embedded device
Pros of Tengine
- Supports a wider range of hardware platforms, including ARM, RISC-V, and x86
- Offers a more comprehensive set of operators and network models
- Provides better support for quantization and model compression techniques
Cons of Tengine
- Less optimized for mobile devices compared to ncnn
- Smaller community and fewer third-party contributions
- Documentation may be less comprehensive or up-to-date
Code Comparison
ncnn:
ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in = ncnn::Mat::from_pixels(image_data, ncnn::Mat::PIXEL_BGR, w, h);
ncnn::Mat out;
net.extract("output", out);
Tengine:
graph_t graph = create_graph(NULL, "tengine", "model.tmfile");
tensor_t input_tensor = get_graph_input_tensor(graph, 0, 0);
set_tensor_shape(input_tensor, dims, 4);
set_tensor_buffer(input_tensor, input_data, img_size);
run_graph(graph, 1);
tensor_t output_tensor = get_graph_output_tensor(graph, 0, 0);
Both repositories focus on efficient neural network inference on various platforms, with ncnn being more specialized for mobile devices and Tengine offering broader hardware support. The code examples demonstrate the different approaches to loading and running models in each framework.
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Pros of ComputeLibrary
- Optimized for ARM architectures, providing excellent performance on ARM-based devices
- Comprehensive support for various neural network operations and algorithms
- Extensive documentation and examples for easier integration
Cons of ComputeLibrary
- Limited cross-platform support compared to ncnn's wider compatibility
- Steeper learning curve due to its more complex API and architecture
- Larger codebase and potentially higher resource requirements
Code Comparison
ncnn:
ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("input", in, "output", out);
ComputeLibrary:
arm_compute::graph::Graph graph;
arm_compute::graph::frontend::Stream stream(graph);
stream << arm_compute::graph::frontend::InputLayer(input_shape)
<< arm_compute::graph::frontend::ConvolutionLayer(...)
<< arm_compute::graph::frontend::OutputLayer();
Both libraries offer efficient neural network inference on mobile and embedded devices. ncnn focuses on cross-platform compatibility and ease of use, while ComputeLibrary provides optimized performance specifically for ARM architectures with a more comprehensive set of operations.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Pros of TVM
- More comprehensive and flexible, supporting a wider range of hardware targets and optimization techniques
- Offers automatic optimization and tuning capabilities for better performance across different platforms
- Provides a higher-level API and supports multiple frontend frameworks (e.g., TensorFlow, PyTorch)
Cons of TVM
- Steeper learning curve due to its complexity and extensive features
- Larger codebase and potentially higher resource requirements for compilation and deployment
Code Comparison
TVM example (Python):
import tvm
from tvm import relay
# Define a simple network
data = relay.var("data", relay.TensorType((1, 3, 224, 224), "float32"))
weight = relay.var("weight")
conv2d = relay.nn.conv2d(data, weight)
func = relay.Function([data, weight], conv2d)
# Compile the network
target = "llvm"
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(func, target)
NCNN example (C++):
#include "net.h"
ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("output", out, in);
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ncnn
ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third-party dependencies. It is cross-platform and runs faster than all known open-source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, creating intelligent APPs, and bringing artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu, and so on.
ncnn æ¯ä¸ä¸ªä¸ºææºç«¯æè´ä¼åçé«æ§è½ç¥ç»ç½ç»åå计ç®æ¡æ¶ã ncnn ä»è®¾è®¡ä¹åæ·±å»èèææºç«¯çé¨ç½²å使ç¨ã æ 第ä¸æ¹ä¾èµï¼è·¨å¹³å°ï¼ææºç«¯ cpu çé度快äºç®åææå·²ç¥çå¼æºæ¡æ¶ã åºäº ncnnï¼å¼åè è½å¤å°æ·±åº¦å¦ä¹ ç®æ³è½»æ¾ç§»æ¤å°ææºç«¯é«ææ§è¡ï¼ å¼ååºäººå·¥æºè½ APPï¼å° AI 带å°ä½ çæå°ã ncnn ç®åå·²å¨è ¾è®¯å¤æ¬¾åºç¨ä¸ä½¿ç¨ï¼å¦ï¼QQï¼Qzoneï¼å¾®ä¿¡ï¼å¤©å¤© P å¾çã
ææ¯äº¤æµ QQ 群 637093648 (è¶ å¤å¤§ä½¬) çæ¡ï¼å·å·å·å·å·ï¼å·²æ»¡ï¼ |
Telegram Group | Discord Channel |
Pocky QQ 群ï¼MLIR YES!ï¼ 677104663 (è¶ å¤å¤§ä½¬) çæ¡ï¼multi-level intermediate representation |
||
ä»ä»¬é½ä¸ç¥é pnnx æå¤å¥½ç¨ç¾¤ 818998520 (æ°ç¾¤ï¼) |
Download & Build status
https://github.com/Tencent/ncnn/releases/latest
how to build ncnn library on Linux / Windows / macOS / Raspberry Pi3, Pi4 / POWER / Android / NVIDIA Jetson / iOS / WebAssembly / AllWinner D1 / Loongson 2K1000 |
||||
Source | ||||
Android | ||||
Android shared | ||||
HarmonyOS | ||||
HarmonyOS shared | ||||
iOS | ||||
iOS-Simulator | ||||
macOS | ||||
Mac-Catalyst | ||||
watchOS | ||||
watchOS-Simulator | ||||
tvOS | ||||
tvOS-Simulator | ||||
visionOS | ||||
visionOS-Simulator | ||||
Apple xcframework | ||||
Ubuntu 20.04 | ||||
Ubuntu 22.04 | ||||
Ubuntu 24.04 | ||||
VS2015 | ||||
VS2017 | ||||
VS2019 | ||||
VS2022 | ||||
WebAssembly | ||||
Linux (arm) | ||||
Linux (aarch64) | ||||
Linux (mips) | ||||
Linux (mips64) | ||||
Linux (ppc64) | ||||
Linux (riscv64) | ||||
Linux (loongarch64) |
Support most commonly used CNN network
æ¯æ大é¨å常ç¨ç CNN ç½ç»
- Classical CNN: VGG AlexNet GoogleNet Inception ...
- Practical CNN: ResNet DenseNet SENet FPN ...
- Light-weight CNN: SqueezeNet MobileNetV1 MobileNetV2/V3 ShuffleNetV1 ShuffleNetV2 MNasNet ...
- Face Detection: MTCNN RetinaFace scrfd ...
- Detection: VGG-SSD MobileNet-SSD SqueezeNet-SSD MobileNetV2-SSDLite MobileNetV3-SSDLite ...
- Detection: Faster-RCNN R-FCN ...
- Detection: YOLOv2 YOLOv3 MobileNet-YOLOv3 YOLOv4 YOLOv5 YOLOv7 YOLOX ...
- Detection: NanoDet
- Segmentation: FCN PSPNet UNet YOLACT ...
- Pose Estimation: SimplePose ...
HowTo
use ncnn with alexnet with detailed steps, recommended for beginners :)
ncnn ç»ä»¶ä½¿ç¨æå alexnet é带详ç»æ¥éª¤ï¼æ°äººå¼ºçæ¨è :)
use netron for ncnn model visualization
ncnn param and model file spec
ncnn operation param weight table
how to implement custom layer step by step
FAQ
Features
- Supports convolutional neural networks, supports multiple input and multi-branch structure, can calculate part of the branch
- No third-party library dependencies, does not rely on BLAS / NNPACK or any other computing framework
- Pure C++ implementation, cross-platform, supports Android, iOS and so on
- ARM NEON assembly level of careful optimization, calculation speed is extremely high
- Sophisticated memory management and data structure design, very low memory footprint
- Supports multi-core parallel computing acceleration, ARM big.LITTLE CPU scheduling optimization
- Supports GPU acceleration via the next-generation low-overhead Vulkan API
- Extensible model design, supports 8bit quantization and half-precision floating point storage, can import caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) models
- Support direct memory zero copy reference load network model
- Can be registered with custom layer implementation and extended
- Well, it is strong, not afraid of being stuffed with å· QvQ
åè½æ¦è¿°
- æ¯æå·ç§¯ç¥ç»ç½ç»ï¼æ¯æå¤è¾å ¥åå¤åæ¯ç»æï¼å¯è®¡ç®é¨ååæ¯
- æ ä»»ä½ç¬¬ä¸æ¹åºä¾èµï¼ä¸ä¾èµ BLAS/NNPACK ç计ç®æ¡æ¶
- 纯 C++ å®ç°ï¼è·¨å¹³å°ï¼æ¯æ Android / iOS ç
- ARM Neon æ±ç¼çº§è¯å¿ä¼åï¼è®¡ç®é度æå¿«
- ç²¾ç»çå å管çåæ°æ®ç»æ设计ï¼å åå ç¨æä½
- æ¯æå¤æ ¸å¹¶è¡è®¡ç®å éï¼ARM big.LITTLE CPU è°åº¦ä¼å
- æ¯æåºäºå ¨æ°ä½æ¶èç Vulkan API GPU å é
- å¯æ©å±ç模å设计ï¼æ¯æ 8bit éå åå精度浮ç¹åå¨ï¼å¯å¯¼å ¥ caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) 模å
- æ¯æç´æ¥å åé¶æ·è´å¼ç¨å è½½ç½ç»æ¨¡å
- å¯æ³¨åèªå®ä¹å±å®ç°å¹¶æ©å±
- æ©ï¼å¾å¼ºå°±æ¯äºï¼ä¸æ被å¡å· QvQ
supported platform matrix
- â = known work and runs fast with good optimization
- âï¸ = known work, but speed may not be fast enough
- â = shall work, not confirmed
- / = not applied
Windows | Linux | Android | macOS | iOS | |
---|---|---|---|---|---|
intel-cpu | âï¸ | âï¸ | â | âï¸ | / |
intel-gpu | âï¸ | âï¸ | â | â | / |
amd-cpu | âï¸ | âï¸ | â | âï¸ | / |
amd-gpu | âï¸ | âï¸ | â | â | / |
nvidia-gpu | âï¸ | âï¸ | â | â | / |
qcom-cpu | â | âï¸ | â | / | / |
qcom-gpu | â | âï¸ | âï¸ | / | / |
arm-cpu | â | â | â | / | / |
arm-gpu | â | â | âï¸ | / | / |
apple-cpu | / | / | / | âï¸ | â |
apple-gpu | / | / | / | âï¸ | âï¸ |
ibm-cpu | / | âï¸ | / | / | / |
Project examples
- https://github.com/nihui/ncnn-android-squeezenet
- https://github.com/nihui/ncnn-android-styletransfer
- https://github.com/nihui/ncnn-android-mobilenetssd
- https://github.com/moli232777144/mtcnn_ncnn
- https://github.com/nihui/ncnn-android-yolov5
- https://github.com/xiang-wuu/ncnn-android-yolov7
- https://github.com/nihui/ncnn-android-scrfd ð¤©
- https://github.com/shaoshengsong/qt_android_ncnn_lib_encrypt_example
-
https://github.com/mizu-bai/ncnn-fortran Call ncnn from Fortran
-
https://github.com/k2-fsa/sherpa Use ncnn for real-time speech recognition (i.e., speech-to-text); also support embedded devices and provide mobile Apps (e.g., Android App)
License
Top Related Projects
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
Tengine is a lite, high performance, modular inference engine for embedded device
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot