Convert Figma logo to code with AI

apache logoaccumulo

Apache Accumulo

1,060
445
1,060
288

Top Related Projects

5,194

Apache HBase

Apache Cassandra®

28,234

A library that provides an embeddable, persistent key-value store for fast storage.

36,086

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

14,635

Apache Hadoop

26,065

The MongoDB Database

Quick Overview

Apache Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift, providing a robust, scalable, high performance data storage and retrieval system. Accumulo extends the BigTable design with cell-based access control and a server-side programming mechanism.

Pros

  • Strong security features with cell-level access control
  • High performance and scalability for large-scale data processing
  • Flexible data model with support for sparse data
  • Server-side programming capabilities with Accumulo Iterators

Cons

  • Steep learning curve for newcomers
  • Complex setup and configuration process
  • Limited ecosystem compared to some other NoSQL databases
  • Requires significant resources to run effectively

Code Examples

  1. Connecting to Accumulo:
ClientContext context = ClientContext.create("instance", "zookeepers", "user", new PasswordToken("password"));
Connector connector = Connector.builder().usingClientContext(context).build();
  1. Writing data to Accumulo:
BatchWriter writer = connector.createBatchWriter("tableName", new BatchWriterConfig());
Mutation mutation = new Mutation("rowID");
mutation.put("columnFamily", "columnQualifier", "value");
writer.addMutation(mutation);
writer.close();
  1. Reading data from Accumulo:
Scanner scanner = connector.createScanner("tableName", Authorizations.EMPTY);
scanner.setRange(new Range("startRow", "endRow"));
scanner.fetchColumnFamily("columnFamily");
for (Map.Entry<Key, Value> entry : scanner) {
    System.out.println("Key: " + entry.getKey() + ", Value: " + entry.getValue());
}

Getting Started

To get started with Apache Accumulo:

  1. Download and install Apache Accumulo from the official website.
  2. Set up Hadoop and Zookeeper as prerequisites.
  3. Configure Accumulo by editing the accumulo-site.xml file.
  4. Start Accumulo using the command: ./bin/accumulo-service start
  5. Use the Accumulo shell to create tables and manage the system:
./bin/accumulo shell -u username -p password
  1. Add the Accumulo client dependency to your project:
<dependency>
    <groupId>org.apache.accumulo</groupId>
    <artifactId>accumulo-core</artifactId>
    <version>2.1.0</version>
</dependency>
  1. Use the code examples above to start interacting with Accumulo in your Java application.

Competitor Comparisons

5,194

Apache HBase

Pros of HBase

  • Wider adoption and larger community support
  • Better integration with the Hadoop ecosystem
  • More mature and stable, with a longer history of production use

Cons of HBase

  • Generally slower performance for certain workloads
  • Less flexible security model
  • More complex setup and configuration process

Code Comparison

HBase example (Java):

Put put = new Put(Bytes.toBytes("row1"));
put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("qual1"), Bytes.toBytes("value1"));
table.put(put);

Accumulo example (Java):

Mutation mutation = new Mutation("row1");
mutation.put("cf", "qual1", "value1");
writer.addMutation(mutation);

Both examples demonstrate adding a key-value pair to a table. HBase uses a Put object, while Accumulo uses a Mutation object. The syntax is similar, but Accumulo's API is generally considered more straightforward.

HBase and Accumulo are both distributed NoSQL databases built on top of Hadoop. While HBase has broader adoption and better Hadoop ecosystem integration, Accumulo offers better performance for certain use cases and a more flexible security model. The choice between the two often depends on specific project requirements and existing infrastructure.

Apache Cassandra®

Pros of Cassandra

  • Better scalability and performance for large-scale distributed systems
  • Wider industry adoption and larger community support
  • More flexible data model with support for wide rows and dynamic columns

Cons of Cassandra

  • Higher memory consumption and resource requirements
  • Less fine-grained security controls compared to Accumulo's cell-level security
  • Steeper learning curve for beginners due to its complex architecture

Code Comparison

Cassandra query example:

SELECT * FROM users
WHERE user_id = 123
AND timestamp > '2023-01-01'
LIMIT 10;

Accumulo query example:

Scanner scanner = connector.createScanner("users", auths);
scanner.setRange(new Range("123"));
scanner.fetchColumnFamily(new Text("timestamp"));
scanner.addCondition(new Condition("timestamp").setStart("2023-01-01"));
scanner.setLimit(10);

Both examples demonstrate querying a user table with filtering and limiting results. Cassandra uses CQL (Cassandra Query Language), which is SQL-like, while Accumulo uses a Java API for querying data. Accumulo's approach offers more programmatic control but may require more code for complex queries.

28,234

A library that provides an embeddable, persistent key-value store for fast storage.

Pros of RocksDB

  • Higher write throughput and lower write amplification
  • Designed for SSD storage, optimized for flash and memory performance
  • More flexible and customizable, with pluggable components

Cons of RocksDB

  • Less mature distributed architecture compared to Accumulo
  • Lacks built-in security features like cell-level security and data encryption
  • May require more manual tuning for optimal performance

Code Comparison

RocksDB (C++):

rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
rocksdb::Status status = rocksdb::DB::Open(options, "/path/to/db", &db);

Accumulo (Java):

Instance instance = new ZooKeeperInstance("myinstance", "localhost:2181");
Connector connector = instance.getConnector("root", new PasswordToken("password"));
connector.tableOperations().create("mytable");

Both systems provide key-value storage capabilities, but their APIs and usage patterns differ significantly. RocksDB offers a lower-level, embedded database approach, while Accumulo provides a distributed, scalable solution with additional features like cell-level security and server-side programming.

RocksDB is generally better suited for applications requiring high-performance local storage, while Accumulo excels in distributed, secure, and scalable data management scenarios. The choice between the two depends on specific use cases, performance requirements, and architectural needs.

36,086

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

Pros of LevelDB

  • Lightweight and embedded key-value store, suitable for use within applications
  • Fast performance for read-heavy workloads
  • Simple API and easy integration into C++ and other language projects

Cons of LevelDB

  • Limited scalability for large datasets compared to Accumulo's distributed architecture
  • Lacks advanced features like cell-level security and server-side programming
  • No built-in support for distributed operations or replication

Code Comparison

LevelDB (C++):

leveldb::DB* db;
leveldb::Options options;
options.create_if_missing = true;
leveldb::Status status = leveldb::DB::Open(options, "/tmp/testdb", &db);
db->Put(leveldb::WriteOptions(), "key", "value");

Accumulo (Java):

Connector connector = new ZooKeeperInstance("instance", "zookeepers").getConnector("user", "pass");
BatchWriter writer = connector.createBatchWriter("table", new BatchWriterConfig());
Mutation mutation = new Mutation("row");
mutation.put("family", "qualifier", "value");
writer.addMutation(mutation);

LevelDB offers a simpler API for basic key-value operations, while Accumulo provides a more complex but feature-rich interface for distributed data storage and retrieval. LevelDB is better suited for embedded use cases, whereas Accumulo excels in large-scale, distributed environments with advanced security and data management requirements.

14,635

Apache Hadoop

Pros of Hadoop

  • Broader ecosystem and wider adoption, with more tools and integrations available
  • Better suited for large-scale batch processing and MapReduce jobs
  • More extensive documentation and community support

Cons of Hadoop

  • Higher complexity and steeper learning curve
  • Less efficient for real-time data processing and random access operations
  • Requires more hardware resources and maintenance overhead

Code Comparison

Hadoop (Java):

public class WordCount extends Configured implements Tool {
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        // ... (mapper implementation)
    }
    // ... (reducer and main method)
}

Accumulo (Java):

public class IngestExample {
    public static void main(String[] args) {
        try (AccumuloClient client = Accumulo.newClient().from(args[0]).build()) {
            BatchWriter writer = client.createBatchWriter("mytable");
            Mutation m = new Mutation("row1");
            m.put("family", "column", "value");
            writer.addMutation(m);
            writer.close();
        }
    }
}

The code examples showcase the different focus of each project. Hadoop's example demonstrates a MapReduce job for word counting, while Accumulo's example shows data ingestion into a table, highlighting its database-like nature.

26,065

The MongoDB Database

Pros of MongoDB

  • Easier to set up and use, with a gentler learning curve
  • More flexible schema design, allowing for dynamic and evolving data structures
  • Larger community and ecosystem, with more third-party tools and resources

Cons of MongoDB

  • Less suitable for handling extremely large datasets compared to Accumulo
  • Lacks some of the advanced security features that Accumulo provides
  • May have lower write performance in certain high-throughput scenarios

Code Comparison

MongoDB query example:

db.collection.find({
  age: { $gt: 25 },
  status: "active"
}).sort({ name: 1 })

Accumulo query example:

Scanner scanner = connector.createScanner("users", auths);
scanner.setRange(new Range("row1", "row2"));
scanner.fetchColumnFamily("personal");
scanner.addSortedKeyValueIterator(new AgeFilter(25));

While both databases support querying and filtering, Accumulo's approach is more low-level and offers finer control over data access and processing. MongoDB's query syntax is generally more intuitive for developers familiar with JSON-like structures.

MongoDB is often preferred for rapid development and flexible data models, while Accumulo excels in scenarios requiring high security, massive scalability, and fine-grained access control.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Apache Accumulo

Build Status Maven Central Javadoc Apache License

Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval. With Apache Accumulo, users can store and manage large data sets across a cluster. Accumulo uses Apache Hadoop's HDFS to store its data and Apache Zookeeper for consensus.

Download the latest version of Apache Accumulo on the project website.

Getting Started

More resources can be found on the project website.

Building

Accumulo uses Maven to compile, test, and package its source. The following command will build the binary tar.gz from source. Add -DskipTests to build without waiting for the tests to run.

mvn package

This command produces assemble/target/accumulo-<version>-bin.tar.gz

Contributing

Contributions are welcome to all Apache Accumulo repositories.

If you want to contribute, read our guide on our website.

Export Control

Click here to show/hide details

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See https://www.wassenaar.org/ for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

Apache Accumulo uses the built-in java cryptography libraries in its RFile encryption implementation. See oracle's export-regulations doc for more details for on Java's cryptography features. Apache Accumulo also uses the bouncycastle library for some cryptographic technology as well. See the BouncyCastle site for more details on bouncycastle's cryptography features.