accumulo

Apache Accumulo

1,077

444

1,077

333

View on GitHub

Top Related Projects

rocksdb

28,450

A library that provides an embeddable, persistent key-value store for fast storage.

leveldb

36,378

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

Quick Overview

Apache Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift, providing a robust, scalable, high performance data storage and retrieval system. Accumulo extends the BigTable design with cell-based access control and a server-side programming mechanism.

Pros

Strong security features with cell-level access control
High performance and scalability for large-scale data processing
Flexible data model with support for sparse data
Server-side programming capabilities with Accumulo Iterators

Cons

Steep learning curve for newcomers
Complex setup and configuration process
Limited ecosystem compared to some other NoSQL databases
Requires significant resources to run effectively

Code Examples

Connecting to Accumulo:

ClientContext context = ClientContext.create("instance", "zookeepers", "user", new PasswordToken("password"));
Connector connector = Connector.builder().usingClientContext(context).build();

Writing data to Accumulo:

BatchWriter writer = connector.createBatchWriter("tableName", new BatchWriterConfig());
Mutation mutation = new Mutation("rowID");
mutation.put("columnFamily", "columnQualifier", "value");
writer.addMutation(mutation);
writer.close();

Reading data from Accumulo:

Scanner scanner = connector.createScanner("tableName", Authorizations.EMPTY);
scanner.setRange(new Range("startRow", "endRow"));
scanner.fetchColumnFamily("columnFamily");
for (Map.Entry<Key, Value> entry : scanner) {
    System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());
}

Getting Started

To get started with Apache Accumulo:

Download and install Apache Accumulo from the official website.
Set up Hadoop and Zookeeper as prerequisites.
Configure Accumulo by editing the accumulo-site.xml file.
Start Accumulo using the command: ./bin/accumulo-service start
Use the Accumulo shell to create tables and manage the system:

./bin/accumulo shell -u username -p password

Add the Accumulo client dependency to your project:

<dependency>
    <groupId>org.apache.accumulo</groupId>
    <artifactId>accumulo-core</artifactId>
    <version>2.1.0</version>
</dependency>

Use the code examples above to start interacting with Accumulo in your Java application.

Competitor Comparisons

hbase

5,214

Apache HBase

Pros of HBase

Wider adoption and larger community support
Better integration with the Hadoop ecosystem
More mature and stable, with a longer history of production use

Cons of HBase

Generally slower performance for certain workloads
Less flexible security model
More complex setup and configuration process

Code Comparison

HBase example (Java):

Put put = new Put(Bytes.toBytes("row1"));
put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"), Bytes.toBytes("value1"));
table.put(put);

Accumulo example (Java):

Mutation mutation = new Mutation("row1");
mutation.put("cf", "qual1", "value1");
writer.addMutation(mutation);

Both examples demonstrate adding a key-value pair to a table. HBase uses a Put object, while Accumulo uses a Mutation object. The syntax is similar, but Accumulo's API is generally considered more straightforward.

HBase and Accumulo are both distributed NoSQL databases built on top of Hadoop. While HBase has broader adoption and better Hadoop ecosystem integration, Accumulo offers better performance for certain use cases and a more flexible security model. The choice between the two often depends on specific project requirements and existing infrastructure.

cassandra

8,785

Apache Cassandra®

Pros of Cassandra

Better scalability and performance for large-scale distributed systems
Wider industry adoption and larger community support
More flexible data model with support for wide rows and dynamic columns

Cons of Cassandra

Higher memory consumption and resource requirements
Less fine-grained security controls compared to Accumulo's cell-level security
Steeper learning curve for beginners due to its complex architecture

Code Comparison

Cassandra query example:

SELECT * FROM users
WHERE user_id = 123
AND timestamp > '2023-01-01'
LIMIT 10;

Accumulo query example:

Scanner scanner = connector.createScanner("users", auths);
scanner.setRange(new Range("123"));
scanner.fetchColumnFamily(new Text("timestamp"));
scanner.addCondition(new Condition("timestamp").setStart("2023-01-01"));
scanner.setLimit(10);

Both examples demonstrate querying a user table with filtering and limiting results. Cassandra uses CQL (Cassandra Query Language), which is SQL-like, while Accumulo uses a Java API for querying data. Accumulo's approach offers more programmatic control but may require more code for complex queries.

rocksdb

28,450

A library that provides an embeddable, persistent key-value store for fast storage.

Pros of RocksDB

Higher write throughput and lower write amplification
Designed for SSD storage, optimized for flash and memory performance
More flexible and customizable, with pluggable components

Cons of RocksDB

Less mature distributed architecture compared to Accumulo
Lacks built-in security features like cell-level security and data encryption
May require more manual tuning for optimal performance

Code Comparison

RocksDB (C++):

rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
rocksdb::Status status = rocksdb::DB::Open(options, "/path/to/db", &db);

Accumulo (Java):

Instance instance = new ZooKeeperInstance("myinstance", "localhost:2181");
Connector connector = instance.getConnector("root", new PasswordToken("password"));
connector.tableOperations().create("mytable");

Both systems provide key-value storage capabilities, but their APIs and usage patterns differ significantly. RocksDB offers a lower-level, embedded database approach, while Accumulo provides a distributed, scalable solution with additional features like cell-level security and server-side programming.

RocksDB is generally better suited for applications requiring high-performance local storage, while Accumulo excels in distributed, secure, and scalable data management scenarios. The choice between the two depends on specific use cases, performance requirements, and architectural needs.

leveldb

36,378

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

Pros of LevelDB

Lightweight and embedded key-value store, suitable for use within applications
Fast performance for read-heavy workloads
Simple API and easy integration into C++ and other language projects

Cons of LevelDB

Limited scalability for large datasets compared to Accumulo's distributed architecture
Lacks advanced features like cell-level security and server-side programming
No built-in support for distributed operations or replication

Code Comparison

LevelDB (C++):

leveldb::DB* db;
leveldb::Options options;
options.create_if_missing = true;
leveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db);
db->Put(leveldb::WriteOptions(), "key", "value");

Accumulo (Java):

Connector connector = new ZooKeeperInstance("instance", "zookeepers").getConnector("user", "pass");
BatchWriter writer = connector.createBatchWriter("table", new BatchWriterConfig());
Mutation mutation = new Mutation("row");
mutation.put("family", "qualifier", "value");
writer.addMutation(mutation);

LevelDB offers a simpler API for basic key-value operations, while Accumulo provides a more complex but feature-rich interface for distributed data storage and retrieval. LevelDB is better suited for embedded use cases, whereas Accumulo excels in large-scale, distributed environments with advanced security and data management requirements.

hadoop

14,703

Apache Hadoop

Pros of Hadoop

Broader ecosystem and wider adoption, with more tools and integrations available
Better suited for large-scale batch processing and MapReduce jobs
More extensive documentation and community support

Cons of Hadoop

Higher complexity and steeper learning curve
Less efficient for real-time data processing and random access operations
Requires more hardware resources and maintenance overhead

Code Comparison

Hadoop (Java):

public class WordCount extends Configured implements Tool {
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        // ... (mapper implementation)
    }
    // ... (reducer and main method)
}

Accumulo (Java):

public class IngestExample {
    public static void main(String[] args) {
        try (AccumuloClient client = Accumulo.newClient().from(args[0]).build()) {
            BatchWriter writer = client.createBatchWriter("mytable");
            Mutation m = new Mutation("row1");
            m.put("family", "column", "value");
            writer.addMutation(m);
            writer.close();
        }
    }
}

The code examples showcase the different focus of each project. Hadoop's example demonstrates a MapReduce job for word counting, while Accumulo's example shows data ingestion into a table, highlighting its database-like nature.

mongo

26,228

The MongoDB Database

Pros of MongoDB

Easier to set up and use, with a gentler learning curve
More flexible schema design, allowing for dynamic and evolving data structures
Larger community and ecosystem, with more third-party tools and resources

Cons of MongoDB

Less suitable for handling extremely large datasets compared to Accumulo
Lacks some of the advanced security features that Accumulo provides
May have lower write performance in certain high-throughput scenarios

Code Comparison

MongoDB query example:

db.collection.find({
  age: { $gt: 25 },
  status: "active"
}).sort({ name: 1 })

Accumulo query example:

Scanner scanner = connector.createScanner("users", auths);
scanner.setRange(new Range("row1", "row2"));
scanner.fetchColumnFamily("personal");
scanner.addSortedKeyValueIterator(new AgeFilter(25));

While both databases support querying and filtering, Accumulo's approach is more low-level and offers finer control over data access and processing. MongoDB's query syntax is generally more intuitive for developers familiar with JSON-like structures.

MongoDB is often preferred for rapid development and flexible data models, while Accumulo excels in scenarios requiring high security, massive scalability, and fine-grained access control.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval. With Apache Accumulo, users can store and manage large data sets across a cluster. Accumulo uses Apache Hadoop's HDFS to store its data and Apache Zookeeper for consensus.

Download the latest version of Apache Accumulo on the project website.

Getting Started

Follow the quick start to install and run Accumulo
Read the Accumulo documentation
Run the Accumulo examples to learn how to write Accumulo clients
View the Javadocs to learn the Accumulo API

More resources can be found on the project website.

Building

Accumulo uses Maven to compile, test, and package its source. The following command will build the binary tar.gz from source. Add -DskipTests to build without waiting for the tests to run.

mvn package

This command produces assemble/target/accumulo-<version>-bin.tar.gz

Contributing

Contributions are welcome to all Apache Accumulo repositories.

If you want to contribute, read our guide on our website.

Export Control

Click here to show/hide details

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See https://www.wassenaar.org/ for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

Apache Accumulo uses the built-in java cryptography libraries in its RFile encryption implementation. See oracle's export-regulations doc for more details for on Java's cryptography features. Apache Accumulo also uses the bouncycastle library for some cryptographic technology as well. See the BouncyCastle site for more details on bouncycastle's cryptography features.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot