ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

21,857

4,300

21,857

1,128

View on GitHub

Top Related Projects

MNN

12,133

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/README.md). MNN TaoAvatar Android - Local 3D Avatar Intelligence: apps/Android/Mnn3dAvatar/README.md

mace

5,020

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Tengine

4,474

Tengine is a lite, high performance, modular inference engine for embedded device

ComputeLibrary

2,993

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

tvm

12,482

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Quick Overview

ncnn is a high-performance neural network inference framework optimized for mobile platforms. Developed by Tencent, it is designed to run deep learning models on mobile devices efficiently, with a focus on speed and low memory footprint.

Pros

Extremely fast inference speed on mobile devices
Low memory footprint, suitable for resource-constrained environments
Cross-platform support (Android, iOS, Windows, Linux, macOS)
Supports a wide range of deep learning models and operations

Cons

Steeper learning curve compared to some other inference frameworks
Limited documentation and examples, especially for beginners
Requires manual model conversion for some popular deep learning frameworks
Smaller community compared to more mainstream frameworks like TensorFlow Lite

Code Examples

Loading and running a model:

#include "net.h"

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");

ncnn::Mat in(224, 224, 3);
// Fill 'in' with input data

ncnn::Mat out;
ncnn::Extractor ex = net.create_extractor();
ex.input("data", in);
ex.extract("output", out);

Creating a custom layer:

class MyCustomLayer : public ncnn::Layer
{
public:
    MyCustomLayer()
    {
        one_blob_only = true;
    }

    virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob, const ncnn::Option& opt) const
    {
        // Implement your custom layer logic here
        return 0;
    }
};

DEFINE_LAYER_CREATOR(MyCustomLayer)

Using Vulkan compute:

ncnn::create_gpu_instance();

ncnn::VulkanDevice* vkdev = ncnn::get_gpu_device();
ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.set_vulkan_device(vkdev);

// Load and run model as usual

ncnn::destroy_gpu_instance();

Getting Started

Clone the repository:

git clone https://github.com/Tencent/ncnn.git

Build the project:

cd ncnn
mkdir build && cd build
cmake ..
make

Include ncnn in your project:
```
#include "net.h"
```
Link against the built library and include the necessary headers in your project's build configuration.

Competitor Comparisons

MNN

12,133

Pros of MNN

Supports a wider range of platforms, including iOS, Android, Windows, Linux, and macOS
Offers more comprehensive model conversion tools, supporting various deep learning frameworks
Provides a higher-level API for easier integration and usage

Cons of MNN

Generally slower performance compared to ncnn, especially on mobile devices
Larger binary size, which may impact app size more significantly
Less focus on minimalism and lightweight design

Code Comparison

MNN example:

auto interpreter = std::shared_ptr<Interpreter>(Interpreter::createFromFile(modelPath));
auto session = interpreter->createSession();
auto input = interpreter->getSessionInput(session, nullptr);
interpreter->runSession(session);

ncnn example:

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("output", out);

Both libraries aim to provide efficient neural network inference on mobile and embedded devices. ncnn focuses on minimalism and performance, particularly excelling on mobile platforms. MNN offers broader platform support and more comprehensive tools but may sacrifice some performance for flexibility. The choice between them depends on specific project requirements, target platforms, and performance needs.

mace

5,020

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Pros of mace

Supports a wider range of deep learning frameworks, including TensorFlow, Caffe, and ONNX
Provides comprehensive performance optimization for various mobile platforms
Offers a user-friendly command-line interface for model conversion and deployment

Cons of mace

Larger library size compared to ncnn
Steeper learning curve for beginners due to more complex architecture
Less frequent updates and community contributions

Code Comparison

mace:

MaceEngine mace_engine(device_type);
mace_engine.Init(net_def, input_nodes, output_nodes, device_context);
mace_engine.Run(inputs, &outputs);

ncnn:

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Extractor ex = net.create_extractor();
ex.input("input", in);
ex.extract("output", out);

Both libraries provide straightforward APIs for loading and running models, but mace's interface is slightly more verbose. ncnn's approach is more compact and may be easier for quick implementations. However, mace's structure allows for more flexibility in specifying device types and contexts, which can be beneficial for complex deployment scenarios.

Tengine

4,474

Tengine is a lite, high performance, modular inference engine for embedded device

Pros of Tengine

Supports a wider range of hardware platforms, including ARM, RISC-V, and x86
Offers a more comprehensive set of operators and network models
Provides better support for quantization and model compression techniques

Cons of Tengine

Less optimized for mobile devices compared to ncnn
Smaller community and fewer third-party contributions
Documentation may be less comprehensive or up-to-date

Code Comparison

ncnn:

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in = ncnn::Mat::from_pixels(image_data, ncnn::Mat::PIXEL_BGR, w, h);
ncnn::Mat out;
net.extract("output", out);

Tengine:

graph_t graph = create_graph(NULL, "tengine", "model.tmfile");
tensor_t input_tensor = get_graph_input_tensor(graph, 0, 0);
set_tensor_shape(input_tensor, dims, 4);
set_tensor_buffer(input_tensor, input_data, img_size);
run_graph(graph, 1);
tensor_t output_tensor = get_graph_output_tensor(graph, 0, 0);

Both repositories focus on efficient neural network inference on various platforms, with ncnn being more specialized for mobile devices and Tengine offering broader hardware support. The code examples demonstrate the different approaches to loading and running models in each framework.

ComputeLibrary

2,993

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

Pros of ComputeLibrary

Optimized for ARM architectures, providing excellent performance on ARM-based devices
Comprehensive support for various neural network operations and algorithms
Extensive documentation and examples for easier integration

Cons of ComputeLibrary

Limited cross-platform support compared to ncnn's wider compatibility
Steeper learning curve due to its more complex API and architecture
Larger codebase and potentially higher resource requirements

Code Comparison

ncnn:

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");

ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("input", in, "output", out);

ComputeLibrary:

arm_compute::graph::Graph graph;
arm_compute::graph::frontend::Stream stream(graph);
stream << arm_compute::graph::frontend::InputLayer(input_shape)
       << arm_compute::graph::frontend::ConvolutionLayer(...)
       << arm_compute::graph::frontend::OutputLayer();

Both libraries offer efficient neural network inference on mobile and embedded devices. ncnn focuses on cross-platform compatibility and ease of use, while ComputeLibrary provides optimized performance specifically for ARM architectures with a more comprehensive set of operations.

tvm

12,482

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Pros of TVM

More comprehensive and flexible, supporting a wider range of hardware targets and optimization techniques
Offers automatic optimization and tuning capabilities for better performance across different platforms
Provides a higher-level API and supports multiple frontend frameworks (e.g., TensorFlow, PyTorch)

Cons of TVM

Steeper learning curve due to its complexity and extensive features
Larger codebase and potentially higher resource requirements for compilation and deployment

Code Comparison

TVM example (Python):

import tvm
from tvm import relay

# Define a simple network
data = relay.var("data", relay.TensorType((1, 3, 224, 224), "float32"))
weight = relay.var("weight")
conv2d = relay.nn.conv2d(data, weight)
func = relay.Function([data, weight], conv2d)

# Compile the network
target = "llvm"
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(func, target)

NCNN example (C++):

#include "net.h"

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");

ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("output", out, in);

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ncnn

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third-party dependencies. It is cross-platform and runs faster than all known open-source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, creating intelligent APPs, and bringing artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu, and so on.

ææ¯äº¤æµ QQ ç¾¤ 637093648 (è¶å¤å¤§ä½¬) çæ¡ï¼å·å·å·å·å·ï¼å·²æ»¡ï¼	Telegram Group https://t.me/ncnnyes	Discord Channel https://discord.gg/YRsxgmF
Pocky QQ ç¾¤ï¼MLIR YES!ï¼ 677104663 (è¶å¤å¤§ä½¬) çæ¡ï¼multi-level intermediate representation
ä»ä»¬é½ä¸ç¥é pnnx æå¤å¥½ç¨ç¾¤ 818998520 (æ°ç¾¤ï¼)

Download & Build status

https://github.com/Tencent/ncnn/releases/latest

	how to build ncnn library on Linux / Windows / macOS / Raspberry Pi3, Pi4 / POWER / Android / NVIDIA Jetson / iOS / WebAssembly / AllWinner D1 / Loongson 2K1000
	Source
	Build for Android Build for Termux on Android
	Android
	Android shared
	Build for HarmonyOS with cross-compiling
	HarmonyOS
	HarmonyOS shared
	Build for iOS on macOS with xcode
	iOS
	iOS-Simulator
	Build for macOS
	macOS
	Mac-Catalyst
	watchOS
	watchOS-Simulator
	tvOS
	tvOS-Simulator
	visionOS
	visionOS-Simulator
	Apple xcframework
	Build for Linux / NVIDIA Jetson / Raspberry Pi3, Pi4 / POWER
	Ubuntu 22.04
	Ubuntu 24.04
	Build for Windows x64 using VS2017 Build for Windows x64 using MinGW-w64
	VS2015
	VS2017
	VS2019
	VS2022
	Build for WebAssembly
	WebAssembly
	Build for ARM Cortex-A family with cross-compiling Build for Hisilicon platform with cross-compiling Build for AllWinner D1 Build for Loongson 2K1000 Build for QNX
	Linux (arm)
	Linux (aarch64)
	Linux (mips)
	Linux (mips64)
	Linux (ppc64)
	Linux (riscv64)
	Linux (loongarch64)

Support most commonly used CNN network

æ¯æå¤§é¨åå¸¸ç¨ç CNN ç½ç»

Classical CNN: VGG AlexNet GoogleNet Inception ...
Practical CNN: ResNet DenseNet SENet FPN ...
Light-weight CNN: SqueezeNet MobileNetV1 MobileNetV2/V3 ShuffleNetV1 ShuffleNetV2 MNasNet ...
Face Detection: MTCNN RetinaFace scrfd ...
Detection: VGG-SSD MobileNet-SSD SqueezeNet-SSD MobileNetV2-SSDLite MobileNetV3-SSDLite ...
Detection: Faster-RCNN R-FCN ...
Detection: YOLOv2 YOLOv3 MobileNet-YOLOv3 YOLOv4 YOLOv5 YOLOv7 YOLOX YOLOv8 ...
Detection: NanoDet
Segmentation: FCN PSPNet UNet YOLACT ...
Pose Estimation: SimplePose ...

HowTo

use ncnn with alexnet with detailed steps, recommended for beginners :)

ncnn ç»ä»¶ä½¿ç¨æå alexnet éå¸¦è¯¦ç»æ¥éª¤ï¼æ°äººå¼ºçæ¨è :)

use netron for ncnn model visualization

use ncnn with pytorch or onnx

ncnn low-level operation api

ncnn param and model file spec

ncnn operation param weight table

how to implement custom layer step by step

FAQ

ncnn deepwiki LLM Answering Questions ;)

ncnn throw error

ncnn produce wrong result

ncnn vulkan

Features

Supports convolutional neural networks, supports multiple input and multi-branch structure, can calculate part of the branch
No third-party library dependencies, does not rely on BLAS / NNPACK or any other computing framework
Pure C++ implementation, cross-platform, supports Android, iOS and so on
ARM NEON assembly level of careful optimization, calculation speed is extremely high
Sophisticated memory management and data structure design, very low memory footprint
Supports multi-core parallel computing acceleration, ARM big.LITTLE CPU scheduling optimization
Supports GPU acceleration via the next-generation low-overhead Vulkan API
Extensible model design, supports 8bit quantization and half-precision floating point storage, can import caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) models
Support direct memory zero copy reference load network model
Can be registered with custom layer implementation and extended
Well, it is strong, not afraid of being stuffed with å· QvQ

åè½æ¦è¿°

æ¯æå·ç§¯ç¥ç»ç½ç»ï¼æ¯æå¤è¾å¥åå¤åæ¯ç»æï¼å¯è®¡ç®é¨ååæ¯
æ ä»»ä½ç¬¬ä¸æ¹åºä¾èµï¼ä¸ä¾èµ BLAS/NNPACK çè®¡ç®æ¡æ¶
çº¯ C++ å®ç°ï¼è·¨å¹³å°ï¼æ¯æ Android / iOS ç
ARM Neon æ±ç¼çº§è¯å¿ä¼åï¼è®¡ç®éåº¦æå¿«
ç²¾ç»çååç®¡çåæ°æ®ç»æè®¾è®¡ï¼ååå ç¨æä½
æ¯æå¤æ ¸å¹¶è¡è®¡ç®å éï¼ARM big.LITTLE CPU è°åº¦ä¼å
æ¯æåºäºå¨æ°ä½æ¶èç Vulkan API GPU å é
æ¯æç´æ¥ååé¶æ·è´å¼ç¨å è½½ç½ç»æ¨¡å

supported platform matrix

â = known work and runs fast with good optimization
âï¸ = known work, but speed may not be fast enough
â = shall work, not confirmed
/ = not applied

	Windows	Linux	Android	macOS	iOS
intel-cpu	âï¸	âï¸	â	âï¸	/
intel-gpu	âï¸	âï¸	â	â	/
amd-cpu	âï¸	âï¸	â	âï¸	/
amd-gpu	âï¸	âï¸	â	â	/
nvidia-gpu	âï¸	âï¸	â	â	/
qcom-cpu	â	âï¸	â	/	/
qcom-gpu	â	âï¸	âï¸	/	/
arm-cpu	â	â	â	/	/
arm-gpu	â	â	âï¸	/	/
apple-cpu	/	/	/	âï¸	â
apple-gpu	/	/	/	âï¸	âï¸
ibm-cpu	/	âï¸	/	/	/

Project examples

https://github.com/magicse/ncnn-colorization-siggraph17
https://github.com/mizu-bai/ncnn-fortran Call ncnn from Fortran
https://github.com/k2-fsa/sherpa Use ncnn for real-time speech recognition (i.e., speech-to-text); also support embedded devices and provide mobile Apps (e.g., Android App)

License

BSD 3 Clause

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of MNN

Cons of MNN

Code Comparison

Pros of mace

Cons of mace

Code Comparison

Pros of Tengine

Cons of Tengine

Code Comparison

Pros of ComputeLibrary

Cons of ComputeLibrary

Code Comparison

Pros of TVM

Cons of TVM

Code Comparison

Convert designs to code with AI

README

ncnn

Download & Build status

Support most commonly used CNN network

æ¯æå¤§é¨åå¸¸ç¨ç CNN ç½ç»

HowTo

FAQ

Features

åè½æ¦è¿°

supported platform matrix

Project examples

License

Top Related Projects

Convert designs to code with AI

æ¯æå¤§é¨åå¸¸ç¨ç CNN ç½ç»

åè½æ¦è¿°