Convert Figma logo to code with AI

apache logoavro

Apache Avro is a data serialization system.

2,880
1,610
2,880
153

Top Related Projects

10,310

Apache Thrift

65,113

Protocol Buffers - Google's data interchange format

Apache Parquet Format

9,031

Main Portal page for the Jackson project

6,953

MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.

Quick Overview

Apache Avro is a data serialization system that provides rich data structures, a compact, fast, binary data format, and container files for storing persistent data. It's designed for efficient data interchange in Apache Hadoop, offering schema evolution and language-independent data serialization.

Pros

  • Compact and fast binary format, reducing storage and transmission costs
  • Schema evolution support, allowing for easy updates to data structures
  • Language-independent serialization, promoting interoperability
  • Built-in support for data compression and splittable files

Cons

  • Steeper learning curve compared to simpler formats like JSON or CSV
  • Requires schema definition for each data structure
  • Limited support in some ecosystems compared to more ubiquitous formats
  • Can be overkill for simple data serialization needs

Code Examples

  1. Defining an Avro schema:
{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "age", "type": "int"},
    {"name": "email", "type": ["null", "string"]}
  ]
}
  1. Serializing data using Avro in Python:
import avro.schema
from avro.datafile import DataFileWriter
from avro.io import DatumWriter

schema = avro.schema.parse(open("user.avsc", "rb").read())

with DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema) as writer:
    writer.append({"name": "Alice", "age": 30, "email": "alice@example.com"})
    writer.append({"name": "Bob", "age": 25, "email": None})
  1. Deserializing Avro data in Java:
import org.apache.avro.file.DataFileReader;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;

DataFileReader<GenericRecord> reader = new DataFileReader<>(new File("users.avro"), new GenericDatumReader<>());
while (reader.hasNext()) {
    GenericRecord user = reader.next();
    System.out.println(user.get("name") + ", " + user.get("age") + ", " + user.get("email"));
}
reader.close();

Getting Started

To use Apache Avro in your project:

  1. Add Avro dependency to your project (e.g., for Maven):

    <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro</artifactId>
      <version>1.11.1</version>
    </dependency>
    
  2. Define your schema in a .avsc file.

  3. Generate classes from your schema (if using code generation):

    java -jar avro-tools-1.11.1.jar compile schema user.avsc .
    
  4. Use the generated classes or generic records to serialize and deserialize data as shown in the code examples above.

Competitor Comparisons

10,310

Apache Thrift

Pros of Thrift

  • Supports a wider range of programming languages (20+) compared to Avro
  • Offers more flexible RPC capabilities with bidirectional streaming
  • Provides built-in versioning support for easier schema evolution

Cons of Thrift

  • More complex schema definition and code generation process
  • Generally slower serialization and deserialization performance
  • Larger message sizes due to additional metadata

Code Comparison

Thrift IDL:

struct Person {
  1: string name
  2: i32 age
  3: optional string email
}

Avro Schema:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "age", "type": "int"},
    {"name": "email", "type": ["null", "string"]}
  ]
}

Both Avro and Thrift are popular data serialization frameworks, but they have different strengths. Thrift offers broader language support and more advanced RPC features, while Avro provides simpler schema evolution and better performance for many use cases. The choice between them often depends on specific project requirements and the ecosystem in which they'll be used.

65,113

Protocol Buffers - Google's data interchange format

Pros of Protocol Buffers

  • Faster serialization and deserialization
  • Smaller message size, leading to reduced network overhead
  • Strong typing and built-in validation

Cons of Protocol Buffers

  • Less flexible schema evolution compared to Avro
  • More complex setup and configuration
  • Limited support for dynamically-typed languages

Code Comparison

Protocol Buffers:

syntax = "proto3";

message Person {
  string name = 1;
  int32 age = 2;
}

Avro:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "age", "type": "int"}
  ]
}

Both Protocol Buffers and Avro are data serialization formats used for efficient data exchange between systems. Protocol Buffers offers better performance and smaller message sizes, making it ideal for high-performance scenarios. However, Avro provides more flexibility in schema evolution and is easier to set up, especially for dynamically-typed languages.

Protocol Buffers uses a more compact binary format, while Avro supports both binary and JSON encodings. The code comparison shows the difference in schema definition: Protocol Buffers uses a custom syntax, while Avro uses JSON for schema definition.

Choose Protocol Buffers for performance-critical applications with static schemas, and Avro for scenarios requiring more flexible schema evolution and easier integration with dynamically-typed languages.

Apache Parquet Format

Pros of Parquet-format

  • Columnar storage format optimized for analytics and big data processing
  • Better compression and encoding schemes, resulting in smaller file sizes
  • Efficient querying of specific columns without reading entire dataset

Cons of Parquet-format

  • More complex file structure, potentially harder to implement and maintain
  • Less flexible schema evolution compared to Avro's dynamic typing
  • Limited support for streaming data scenarios

Code Comparison

Avro schema example:

{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "age", "type": "int"}
  ]
}

Parquet schema example:

message User {
  required binary name (UTF8);
  required int32 age;
}

Both formats support schema definitions, but Avro uses JSON for schema representation, while Parquet uses a custom message format. Avro's schema is more human-readable and easier to work with programmatically, while Parquet's schema is more compact and closely tied to its columnar storage structure.

9,031

Main Portal page for the Jackson project

Pros of Jackson

  • More flexible and supports a wider range of data formats (JSON, XML, YAML, etc.)
  • Extensive customization options and annotations for fine-grained control
  • Larger ecosystem with numerous modules and extensions

Cons of Jackson

  • Can be more complex to set up and configure for advanced use cases
  • May have slightly higher memory usage and runtime overhead
  • Less focus on schema evolution compared to Avro

Code Comparison

Jackson:

ObjectMapper mapper = new ObjectMapper();
MyObject obj = mapper.readValue(jsonString, MyObject.class);
String json = mapper.writeValueAsString(obj);

Avro:

DatumReader<MyObject> reader = new SpecificDatumReader<>(MyObject.class);
Decoder decoder = DecoderFactory.get().jsonDecoder(SCHEMA, jsonString);
MyObject obj = reader.read(null, decoder);

Both Jackson and Avro are popular serialization frameworks, but they serve different purposes. Jackson is more versatile and widely used for general-purpose JSON processing, while Avro excels in schema-based serialization and data exchange, particularly in big data ecosystems. The choice between them depends on specific project requirements and use cases.

6,953

MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.

Pros of MessagePack

  • Simpler and more lightweight serialization format
  • Faster encoding and decoding performance
  • Wider language support and ecosystem

Cons of MessagePack

  • Lacks schema evolution capabilities
  • No built-in compression support
  • Less robust type system compared to Avro

Code Comparison

MessagePack:

import msgpack

data = {"name": "John", "age": 30}
packed = msgpack.packb(data)
unpacked = msgpack.unpackb(packed)

Avro:

import avro.schema
from avro.datafile import DataFileWriter, DataFileReader
from avro.io import DatumWriter, DatumReader

schema = avro.schema.parse(open("user.avsc", "rb").read())
writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), schema)
writer.append({"name": "John", "age": 30})
writer.close()

MessagePack offers a more straightforward API for serialization and deserialization, while Avro requires more setup with schema definitions and file handling. Avro's approach provides stronger typing and schema evolution capabilities, but at the cost of increased complexity.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Apache Avro™Avro Logo

Current CI status (Github servers)

test c test c# test c++ test java test javascript test perl test ruby test python test php

rust continuous integration rust clippy check rust security audit

Current CI status (ARM based servers)

test c ARM test c# ARM test c++ ARM test java ARM test javascript ARM test perl ARM test ruby ARM test python ARM test php ARM rust continuous integration ARM

Current CodeQL status

codeql c# codeql java codeql javascript codeql python


Apache Avro™ is a data serialization system.

Learn more about Avro, please visit our website at:

https://avro.apache.org/

To contribute to Avro, please read:

https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute

You can use devcontainers to develop Avro:

  • Open in Visual Studio Code
  • Open in Github Codespaces

Trademark & logo's

Apache®, Apache Avro and the Apache Avro airplane logo are trademarks of The Apache Software Foundation.

The Apache Avro airplane logo on this page has been designed by Emma Kellam for use by this project.

NPM DownloadsLast 30 Days