Convert Figma logo to code with AI

lifting-bits logoremill

Library for lifting machine code to LLVM bitcode

1,286
145
1,286
76

Top Related Projects

2,648

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode

7,992

RetDec is a retargetable machine-code decompiler based on LLVM.

7,537

A powerful and user-friendly binary analysis platform!

20,547

UNIX-like reverse engineering framework and command-line toolset

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.

Quick Overview

Remill is an open-source library for lifting machine code to LLVM bitcode. It supports multiple architectures including x86, x86_64, and AArch64, and can be used for various purposes such as binary analysis, reverse engineering, and program transformation.

Pros

  • Supports multiple architectures (x86, x86_64, AArch64)
  • Integrates well with LLVM ecosystem
  • Actively maintained and regularly updated
  • Provides a flexible API for custom use cases

Cons

  • Steep learning curve for beginners
  • Limited documentation for advanced features
  • May require significant computational resources for large binaries
  • Dependency on LLVM version can cause compatibility issues

Code Examples

  1. Lifting x86 assembly to LLVM bitcode:
#include <remill/Arch/X86/Runtime/State.h>
#include <remill/BC/Lifter.h>

int main() {
    auto arch = remill::Arch::GetArchitecture(remill::kArchX86);
    auto lifter = std::make_unique<remill::InstructionLifter>(arch.get());
    
    std::string assembly = "mov eax, 42";
    llvm::LLVMContext context;
    auto module = lifter->LiftInstructionToModule(assembly, context);
}
  1. Analyzing lifted bitcode:
#include <remill/BC/Util.h>

void analyze_bitcode(llvm::Module *module) {
    for (auto &function : module->functions()) {
        if (remill::IsLiftedFunction(function)) {
            // Analyze lifted function
            for (auto &block : function) {
                // Analyze basic block
            }
        }
    }
}
  1. Transforming lifted code:
#include <remill/BC/IntrinsicTable.h>

void transform_lifted_code(llvm::Module *module) {
    remill::IntrinsicTable intrinsics(module);
    
    for (auto &function : module->functions()) {
        if (remill::IsLiftedFunction(function)) {
            // Apply custom transformations
            // e.g., replace memory intrinsics, optimize branches, etc.
        }
    }
}

Getting Started

To get started with Remill:

  1. Clone the repository:

    git clone https://github.com/lifting-bits/remill.git
    
  2. Install dependencies (on Ubuntu):

    sudo apt-get install build-essential cmake python3-pip
    pip3 install --user --upgrade pip
    pip3 install --user --upgrade setuptools wheel
    
  3. Build Remill:

    cd remill
    mkdir build && cd build
    cmake ..
    make -j$(nproc)
    
  4. Include Remill in your project's CMakeLists.txt:

    find_package(remill REQUIRED)
    target_link_libraries(your_target remill)
    

Competitor Comparisons

2,648

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode

Pros of McSema

  • Supports a wider range of architectures, including x86, x86_64, and ARM
  • Provides more comprehensive binary analysis capabilities
  • Offers better integration with other binary analysis tools

Cons of McSema

  • More complex setup and usage compared to Remill
  • Slower lifting process due to its comprehensive nature
  • Requires more system resources for operation

Code Comparison

McSema:

#include <remill/Arch/Arch.h>
#include <remill/BC/Util.h>
#include <mcsema/Arch/Arch.h>
#include <mcsema/BC/Util.h>

void LiftFunction(const mcsema::Arch *arch, llvm::Function *func) {
    // McSema-specific lifting code
}

Remill:

#include <remill/Arch/Arch.h>
#include <remill/BC/Util.h>

void LiftInstruction(const remill::Arch *arch, llvm::BasicBlock *block) {
    // Remill-specific lifting code
}

Both McSema and Remill are part of the lifting-bits project and share some common components. McSema builds upon Remill's foundation, offering more features and broader architecture support at the cost of increased complexity. Remill focuses on providing a simpler, more streamlined approach to instruction lifting, making it easier to use for specific tasks but with more limited capabilities compared to McSema.

7,992

RetDec is a retargetable machine-code decompiler based on LLVM.

Pros of RetDec

  • More comprehensive decompilation capabilities, supporting multiple architectures and file formats
  • Includes a graphical user interface for easier use by non-technical users
  • Actively maintained with regular updates and community support

Cons of RetDec

  • Slower decompilation process compared to Remill's lifting approach
  • Larger codebase and more complex setup, potentially making it harder to integrate into other projects
  • May produce less accurate results for certain specific use cases

Code Comparison

RetDec (C++ decompilation output):

int32_t function_401000(int32_t a1) {
    int32_t v1 = a1 * 2;
    return v1 + 5;
}

Remill (LLVM IR lifting output):

define i32 @sub_401000(i32 %a1) {
    %v1 = mul i32 %a1, 2
    %result = add i32 %v1, 5
    ret i32 %result
}

Both projects aim to analyze binary code, but RetDec focuses on full decompilation to high-level languages, while Remill specializes in lifting machine code to LLVM IR. RetDec offers a more user-friendly approach for general reverse engineering tasks, whereas Remill provides a powerful foundation for advanced binary analysis and transformation tools.

7,537

A powerful and user-friendly binary analysis platform!

Pros of angr

  • More comprehensive analysis framework with symbolic execution capabilities
  • Larger community and ecosystem of plugins/extensions
  • Supports a wider range of architectures and binary formats

Cons of angr

  • Steeper learning curve due to complexity
  • Can be slower for certain types of analysis
  • Requires more system resources, especially for large binaries

Code Comparison

angr example:

import angr

proj = angr.Project('binary')
state = proj.factory.entry_state()
simgr = proj.factory.simulation_manager(state)
simgr.explore(find=0x400000)

Remill example:

#include <remill/Arch/X86/Runtime/State.h>
#include <remill/BC/Lifter.h>

auto module = remill::LoadModuleFromFile(arch, bc_file);
auto func = remill::LiftCodeIntoModule(module, addr);

The angr code demonstrates setting up a project and running symbolic execution, while the Remill code shows lifting binary code to LLVM IR. angr provides higher-level abstractions for program analysis, whereas Remill focuses on instruction lifting and translation to LLVM IR.

20,547

UNIX-like reverse engineering framework and command-line toolset

Pros of radare2

  • Comprehensive reverse engineering framework with a wide range of features
  • Large and active community, extensive documentation, and plugins ecosystem
  • Supports a vast array of architectures and file formats

Cons of radare2

  • Steeper learning curve due to its extensive feature set
  • Can be resource-intensive for large binaries or complex analysis tasks
  • Command-line interface may be less intuitive for some users

Code comparison

radare2:

r_core_cmd(core, "aaa", 0);
r_core_cmd(core, "pdf @ main", 0);

Remill:

auto module = LoadModuleFromFile(argv[1], &context);
auto program = GenerateProgram(*module);

Key differences

Radare2 is a full-featured reverse engineering framework, while Remill focuses on lifting binary code to LLVM IR. Radare2 offers a broader set of tools for various reverse engineering tasks, whereas Remill specializes in binary-to-IR translation for further analysis or recompilation.

Radare2 is more suitable for interactive analysis and scripting, while Remill is designed to be integrated into larger binary analysis systems. Radare2 has a larger user base and more extensive documentation, but Remill's specialized focus may make it more efficient for certain binary lifting tasks.

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.

Pros of Capstone

  • Wider architecture support (x86, ARM, MIPS, PowerPC, etc.)
  • More mature and established project with extensive documentation
  • Lightweight and easy to integrate into existing projects

Cons of Capstone

  • Primarily focused on disassembly, not lifting to intermediate representation
  • Less suitable for advanced program analysis tasks
  • May require additional tools for more complex reverse engineering workflows

Code Comparison

Capstone (disassembly example):

cs_insn *insn;
size_t count = cs_disasm(handle, code, code_size, address, 0, &insn);
for (size_t j = 0; j < count; j++) {
    printf("0x%"PRIx64":\t%s\t\t%s\n", insn[j].address, insn[j].mnemonic, insn[j].op_str);
}

Remill (lifting example):

auto lifted_block = remill::LiftCodeBlock(arch, memory, block_address);
for (const auto &inst : lifted_block->instructions) {
    std::cout << inst.Serialize() << std::endl;
}

Remill focuses on lifting machine code to an intermediate representation, which is more suitable for advanced program analysis and transformation tasks. Capstone, on the other hand, excels at disassembly and provides a simpler API for basic instruction decoding across multiple architectures.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Remill Slack Chat

Remill is a static binary translator that translates machine code instructions into LLVM bitcode. It translates AArch64 (64-bit ARMv8), SPARC32 (SPARCv8), SPARC64 (SPARCv9), x86 and amd64 machine code (including AVX and AVX512) into LLVM bitcode. AArch32 (32-bit ARMv8 / ARMv7) support is underway.

Remill focuses on accurately lifting instructions. It is meant to be used as a library for other tools, e.g. McSema.

Build Status

Build Status

Documentation

To understand how Remill works you can take a look at the following resources:

If you would like to contribute you can check out: How to contribute

Getting Help

If you are experiencing undocumented problems with Remill then ask for help in the #binary-lifting channel of the Empire Hacking Slack.

Supported Platforms

Remill is supported on Linux platforms and has been tested on Ubuntu 22.04. Remill also works on macOS, and has experimental support for Windows.

Remill's Linux version can also be built via Docker for quicker testing.

Dependencies

Most of Remill's dependencies can be provided by the cxx-common repository. Trail of Bits hosts downloadable, pre-built versions of cxx-common, which makes it substantially easier to get up and running with Remill. Nonetheless, the following table represents most of Remill's dependencies.

NameVersion
GitLatest
CMake3.14+
Google FlagsLatest
Google LogLatest
Google TestLatest
LLVM15+
Clang15
Intel XEDLatest
Python2.7
UnzipLatest
ccacheLatest

Getting and Building the Code

Docker Build

Remill now comes with a Dockerfile for easier testing. This Dockerfile references the cxx-common container to have all pre-requisite libraries available.

The Dockerfile allows for quick builds of multiple supported LLVM, and Ubuntu configurations.

[!IMPORTANT] Not all LLVM and Ubuntu configurations are supported---Please refer to the CI results to get an idea about configurations that are tested and supported. The Docker image should build on both x86_64 and ARM64, but we only test x86_64 in CI. ARM64 should build, but if it doesn't, please open an issue.

Quickstart (builds Remill against LLVM 17 on Ubuntu 22.04).

Clone Remill:

git clone https://github.com/lifting-bits/remill.git
cd remill

Build Remill Docker container:

docker build . -t remill \
     -f Dockerfile \
     --build-arg UBUNTU_VERSION=22.04 \
     --build-arg LLVM_VERSION=17

Ensure remill works:

Decode some AMD64 instructions to LLVM:

docker run --rm -it remill \
     --arch amd64 --ir_out /dev/stdout --bytes c704ba01000000

Decode some AArch64 instructions to LLVM:

docker run --rm -it remill \
     --arch aarch64 --address 0x400544 --ir_out /dev/stdout \
     --bytes FD7BBFA90000009000601891FD030091B7FFFF97E0031F2AFD7BC1A8C0035FD6

On Linux

First, update aptitude and get install the baseline dependencies.

sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get upgrade

sudo apt-get install \
     git \
     python3 \
     wget \
     curl \
     build-essential \
     lsb-release \
     ccache \
     libc6-dev:i386 \
     'libstdc++-*-dev:i386' \
     g++-multilib \
     rpm

Next, clone the repository. This will clone the code into the remill directory.

git clone https://github.com/lifting-bits/remill.git

Next, we build Remill. This script will create another directory, remill-build, in the current working directory. All remaining dependencies needed by Remill will be built in the remill-build directory.

./remill/scripts/build.sh

Next, we can install Remill. Remill itself is a library, and so there is no real way to try it. However, you can head on over to the McSema repository, which uses Remill for lifting instructions.

cd ./remill-build
sudo make install

We can also build and run Remill's test suite.

cd ./remill-build
make test_dependencies
make test

Full Source Builds

Sometimes, you want to build everything from source, including the cxx-common libraries remill depends on. To build against a custom cxx-common location, you can use the following cmake invocation:

mkdir build
cd build
cmake  \
  -DCMAKE_INSTALL_PREFIX="<path where remill will install>" \
  -DCMAKE_TOOLCHAIN_FILE="<path to cxx-common directory>/vcpkg/scripts/buildsystems/vcpkg.cmake"  \
  -G Ninja  \
  ..
cmake --build .
cmake --build . --target install

The output may produce some CMake warnings about policy CMP0003. These warnings are safe to ignore.

Common Build Issues

If you see errors similar to the following:

fatal error: 'bits/c++config.h' file not found

Then you need to install 32-bit libstdc++ headers and libraries. On a Debian/Ubuntu based distribution, You would want to do something like this:

sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install libc6-dev:i386 libstdc++-10-dev:i386 g++-multilib

This error happens because the SPARC32 runtime semantics (the bitcode library which lives in <install directory>/share/remill/<version>/semantics/sparc32.bc) are built as 32-bit code, but 32-bit development libraries are not installed by default.

A similar situation occurs when building remill on arm64 Linux. In that case, you want to follow a similar workflow, except the architecture used in dpkg and apt-get commands would be armhf instead of i386.

Another alternative is to disable SPARC32 runtime semantics. To do that, use the -DREMILL_BUILD_SPARC32_RUNTIME=False option when invoking cmake.