Top Related Projects
Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode
RetDec is a retargetable machine-code decompiler based on LLVM.
A powerful and user-friendly binary analysis platform!
UNIX-like reverse engineering framework and command-line toolset
Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
Quick Overview
Remill is an open-source library for lifting machine code to LLVM bitcode. It supports multiple architectures including x86, x86_64, and AArch64, and can be used for various purposes such as binary analysis, reverse engineering, and program transformation.
Pros
- Supports multiple architectures (x86, x86_64, AArch64)
- Integrates well with LLVM ecosystem
- Actively maintained and regularly updated
- Provides a flexible API for custom use cases
Cons
- Steep learning curve for beginners
- Limited documentation for advanced features
- May require significant computational resources for large binaries
- Dependency on LLVM version can cause compatibility issues
Code Examples
- Lifting x86 assembly to LLVM bitcode:
#include <remill/Arch/X86/Runtime/State.h>
#include <remill/BC/Lifter.h>
int main() {
auto arch = remill::Arch::GetArchitecture(remill::kArchX86);
auto lifter = std::make_unique<remill::InstructionLifter>(arch.get());
std::string assembly = "mov eax, 42";
llvm::LLVMContext context;
auto module = lifter->LiftInstructionToModule(assembly, context);
}
- Analyzing lifted bitcode:
#include <remill/BC/Util.h>
void analyze_bitcode(llvm::Module *module) {
for (auto &function : module->functions()) {
if (remill::IsLiftedFunction(function)) {
// Analyze lifted function
for (auto &block : function) {
// Analyze basic block
}
}
}
}
- Transforming lifted code:
#include <remill/BC/IntrinsicTable.h>
void transform_lifted_code(llvm::Module *module) {
remill::IntrinsicTable intrinsics(module);
for (auto &function : module->functions()) {
if (remill::IsLiftedFunction(function)) {
// Apply custom transformations
// e.g., replace memory intrinsics, optimize branches, etc.
}
}
}
Getting Started
To get started with Remill:
-
Clone the repository:
git clone https://github.com/lifting-bits/remill.git
-
Install dependencies (on Ubuntu):
sudo apt-get install build-essential cmake python3-pip pip3 install --user --upgrade pip pip3 install --user --upgrade setuptools wheel
-
Build Remill:
cd remill mkdir build && cd build cmake .. make -j$(nproc)
-
Include Remill in your project's CMakeLists.txt:
find_package(remill REQUIRED) target_link_libraries(your_target remill)
Competitor Comparisons
Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode
Pros of McSema
- Supports a wider range of architectures, including x86, x86_64, and ARM
- Provides more comprehensive binary analysis capabilities
- Offers better integration with other binary analysis tools
Cons of McSema
- More complex setup and usage compared to Remill
- Slower lifting process due to its comprehensive nature
- Requires more system resources for operation
Code Comparison
McSema:
#include <remill/Arch/Arch.h>
#include <remill/BC/Util.h>
#include <mcsema/Arch/Arch.h>
#include <mcsema/BC/Util.h>
void LiftFunction(const mcsema::Arch *arch, llvm::Function *func) {
// McSema-specific lifting code
}
Remill:
#include <remill/Arch/Arch.h>
#include <remill/BC/Util.h>
void LiftInstruction(const remill::Arch *arch, llvm::BasicBlock *block) {
// Remill-specific lifting code
}
Both McSema and Remill are part of the lifting-bits project and share some common components. McSema builds upon Remill's foundation, offering more features and broader architecture support at the cost of increased complexity. Remill focuses on providing a simpler, more streamlined approach to instruction lifting, making it easier to use for specific tasks but with more limited capabilities compared to McSema.
RetDec is a retargetable machine-code decompiler based on LLVM.
Pros of RetDec
- More comprehensive decompilation capabilities, supporting multiple architectures and file formats
- Includes a graphical user interface for easier use by non-technical users
- Actively maintained with regular updates and community support
Cons of RetDec
- Slower decompilation process compared to Remill's lifting approach
- Larger codebase and more complex setup, potentially making it harder to integrate into other projects
- May produce less accurate results for certain specific use cases
Code Comparison
RetDec (C++ decompilation output):
int32_t function_401000(int32_t a1) {
int32_t v1 = a1 * 2;
return v1 + 5;
}
Remill (LLVM IR lifting output):
define i32 @sub_401000(i32 %a1) {
%v1 = mul i32 %a1, 2
%result = add i32 %v1, 5
ret i32 %result
}
Both projects aim to analyze binary code, but RetDec focuses on full decompilation to high-level languages, while Remill specializes in lifting machine code to LLVM IR. RetDec offers a more user-friendly approach for general reverse engineering tasks, whereas Remill provides a powerful foundation for advanced binary analysis and transformation tools.
A powerful and user-friendly binary analysis platform!
Pros of angr
- More comprehensive analysis framework with symbolic execution capabilities
- Larger community and ecosystem of plugins/extensions
- Supports a wider range of architectures and binary formats
Cons of angr
- Steeper learning curve due to complexity
- Can be slower for certain types of analysis
- Requires more system resources, especially for large binaries
Code Comparison
angr example:
import angr
proj = angr.Project('binary')
state = proj.factory.entry_state()
simgr = proj.factory.simulation_manager(state)
simgr.explore(find=0x400000)
Remill example:
#include <remill/Arch/X86/Runtime/State.h>
#include <remill/BC/Lifter.h>
auto module = remill::LoadModuleFromFile(arch, bc_file);
auto func = remill::LiftCodeIntoModule(module, addr);
The angr code demonstrates setting up a project and running symbolic execution, while the Remill code shows lifting binary code to LLVM IR. angr provides higher-level abstractions for program analysis, whereas Remill focuses on instruction lifting and translation to LLVM IR.
UNIX-like reverse engineering framework and command-line toolset
Pros of radare2
- Comprehensive reverse engineering framework with a wide range of features
- Large and active community, extensive documentation, and plugins ecosystem
- Supports a vast array of architectures and file formats
Cons of radare2
- Steeper learning curve due to its extensive feature set
- Can be resource-intensive for large binaries or complex analysis tasks
- Command-line interface may be less intuitive for some users
Code comparison
radare2:
r_core_cmd(core, "aaa", 0);
r_core_cmd(core, "pdf @ main", 0);
Remill:
auto module = LoadModuleFromFile(argv[1], &context);
auto program = GenerateProgram(*module);
Key differences
Radare2 is a full-featured reverse engineering framework, while Remill focuses on lifting binary code to LLVM IR. Radare2 offers a broader set of tools for various reverse engineering tasks, whereas Remill specializes in binary-to-IR translation for further analysis or recompilation.
Radare2 is more suitable for interactive analysis and scripting, while Remill is designed to be integrated into larger binary analysis systems. Radare2 has a larger user base and more extensive documentation, but Remill's specialized focus may make it more efficient for certain binary lifting tasks.
Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
Pros of Capstone
- Wider architecture support (x86, ARM, MIPS, PowerPC, etc.)
- More mature and established project with extensive documentation
- Lightweight and easy to integrate into existing projects
Cons of Capstone
- Primarily focused on disassembly, not lifting to intermediate representation
- Less suitable for advanced program analysis tasks
- May require additional tools for more complex reverse engineering workflows
Code Comparison
Capstone (disassembly example):
cs_insn *insn;
size_t count = cs_disasm(handle, code, code_size, address, 0, &insn);
for (size_t j = 0; j < count; j++) {
printf("0x%"PRIx64":\t%s\t\t%s\n", insn[j].address, insn[j].mnemonic, insn[j].op_str);
}
Remill (lifting example):
auto lifted_block = remill::LiftCodeBlock(arch, memory, block_address);
for (const auto &inst : lifted_block->instructions) {
std::cout << inst.Serialize() << std::endl;
}
Remill focuses on lifting machine code to an intermediate representation, which is more suitable for advanced program analysis and transformation tasks. Capstone, on the other hand, excels at disassembly and provides a simpler API for basic instruction decoding across multiple architectures.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Remill
Remill is a static binary translator that translates machine code instructions into LLVM bitcode. It translates AArch64 (64-bit ARMv8), SPARC32 (SPARCv8), SPARC64 (SPARCv9), x86 and amd64 machine code (including AVX and AVX512) into LLVM bitcode. AArch32 (32-bit ARMv8 / ARMv7) support is underway.
Remill focuses on accurately lifting instructions. It is meant to be used as a library for other tools, e.g. McSema.
Build Status
Documentation
To understand how Remill works you can take a look at the following resources:
- Step-by-step guide on how Remill lifts an instruction
- How to implement the semantics of an instruction
- The design and architecture of Remill
If you would like to contribute you can check out: How to contribute
Getting Help
If you are experiencing undocumented problems with Remill then ask for help in the #binary-lifting
channel of the Empire Hacking Slack.
Supported Platforms
Remill is supported on Linux platforms and has been tested on Ubuntu 22.04. Remill also works on macOS, and has experimental support for Windows.
Remill's Linux version can also be built via Docker for quicker testing.
Dependencies
Most of Remill's dependencies can be provided by the cxx-common repository. Trail of Bits hosts downloadable, pre-built versions of cxx-common, which makes it substantially easier to get up and running with Remill. Nonetheless, the following table represents most of Remill's dependencies.
Name | Version |
---|---|
Git | Latest |
CMake | 3.14+ |
Google Flags | Latest |
Google Log | Latest |
Google Test | Latest |
LLVM | 15+ |
Clang | 15 |
Intel XED | Latest |
Python | 2.7 |
Unzip | Latest |
ccache | Latest |
Getting and Building the Code
Docker Build
Remill now comes with a Dockerfile for easier testing. This Dockerfile references the cxx-common container to have all pre-requisite libraries available.
The Dockerfile allows for quick builds of multiple supported LLVM, and Ubuntu configurations.
[!IMPORTANT] Not all LLVM and Ubuntu configurations are supported---Please refer to the CI results to get an idea about configurations that are tested and supported. The Docker image should build on both x86_64 and ARM64, but we only test x86_64 in CI. ARM64 should build, but if it doesn't, please open an issue.
Quickstart (builds Remill against LLVM 17 on Ubuntu 22.04).
Clone Remill:
git clone https://github.com/lifting-bits/remill.git
cd remill
Build Remill Docker container:
docker build . -t remill \
-f Dockerfile \
--build-arg UBUNTU_VERSION=22.04 \
--build-arg LLVM_VERSION=17
Ensure remill works:
Decode some AMD64 instructions to LLVM:
docker run --rm -it remill \
--arch amd64 --ir_out /dev/stdout --bytes c704ba01000000
Decode some AArch64 instructions to LLVM:
docker run --rm -it remill \
--arch aarch64 --address 0x400544 --ir_out /dev/stdout \
--bytes FD7BBFA90000009000601891FD030091B7FFFF97E0031F2AFD7BC1A8C0035FD6
On Linux
First, update aptitude and get install the baseline dependencies.
sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install \
git \
python3 \
wget \
curl \
build-essential \
lsb-release \
ccache \
libc6-dev:i386 \
'libstdc++-*-dev:i386' \
g++-multilib \
rpm
Next, clone the repository. This will clone the code into the remill
directory.
git clone https://github.com/lifting-bits/remill.git
Next, we build Remill. This script will create another directory, remill-build
,
in the current working directory. All remaining dependencies needed
by Remill will be built in the remill-build
directory.
./remill/scripts/build.sh
Next, we can install Remill. Remill itself is a library, and so there is no real way to try it. However, you can head on over to the McSema repository, which uses Remill for lifting instructions.
cd ./remill-build
sudo make install
We can also build and run Remill's test suite.
cd ./remill-build
make test_dependencies
make test
Full Source Builds
Sometimes, you want to build everything from source, including the cxx-common libraries remill depends on. To build against a custom cxx-common location, you can use the following cmake
invocation:
mkdir build
cd build
cmake \
-DCMAKE_INSTALL_PREFIX="<path where remill will install>" \
-DCMAKE_TOOLCHAIN_FILE="<path to cxx-common directory>/vcpkg/scripts/buildsystems/vcpkg.cmake" \
-G Ninja \
..
cmake --build .
cmake --build . --target install
The output may produce some CMake warnings about policy CMP0003. These warnings are safe to ignore.
Common Build Issues
If you see errors similar to the following:
fatal error: 'bits/c++config.h' file not found
Then you need to install 32-bit libstdc++ headers and libraries. On a Debian/Ubuntu based distribution, You would want to do something like this:
sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install libc6-dev:i386 libstdc++-10-dev:i386 g++-multilib
This error happens because the SPARC32 runtime semantics (the bitcode library which lives in <install directory>/share/remill/<version>/semantics/sparc32.bc
) are built as 32-bit code, but 32-bit development libraries are not installed by default.
A similar situation occurs when building remill on arm64 Linux. In that case, you want to follow a similar workflow, except the architecture used in dpkg
and apt-get
commands would be armhf
instead of i386
.
Another alternative is to disable SPARC32 runtime semantics. To do that, use the -DREMILL_BUILD_SPARC32_RUNTIME=False
option when invoking cmake
.
Top Related Projects
Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode
RetDec is a retargetable machine-code decompiler based on LLVM.
A powerful and user-friendly binary analysis platform!
UNIX-like reverse engineering framework and command-line toolset
Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot