Convert Figma logo to code with AI

apache logoatlas

Apache Atlas

1,809
835
1,809
77

Top Related Projects

9,641

The Metadata Platform for your Data Stack

9,636

The Metadata Platform for your Data Stack

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

An Open Standard for lineage metadata collection

Quick Overview

Apache Atlas is an open-source data governance and metadata framework for Hadoop and Big Data platforms. It provides a comprehensive set of tools and services to enable data discovery, lineage, and governance across various data sources and processing frameworks.

Pros

  • Comprehensive Data Governance: Atlas provides a centralized platform for managing metadata, data lineage, and data policies across multiple data sources and processing frameworks.
  • Scalable and Extensible: Atlas is designed to handle large-scale data environments and can be extended to support new data sources and processing frameworks.
  • Flexible Data Modeling: Atlas supports a flexible data model that can be customized to fit the specific needs of an organization's data landscape.
  • Robust Security and Access Control: Atlas provides fine-grained access control and security features to ensure data privacy and compliance.

Cons

  • Steep Learning Curve: Setting up and configuring Atlas can be a complex process, especially for organizations new to data governance.
  • Limited Community Support: Compared to some other open-source projects, the Apache Atlas community may be smaller and less active, which can make it harder to find support and resources.
  • Integration Challenges: Integrating Atlas with existing data infrastructure and tools can be time-consuming and may require significant technical expertise.
  • Performance Limitations: Depending on the size and complexity of the data environment, Atlas may experience performance issues, especially when handling large volumes of metadata.

Getting Started

To get started with Apache Atlas, follow these steps:

  1. Install Apache Atlas: You can download the latest version of Apache Atlas from the official website. Follow the installation instructions for your specific platform.

  2. Configure Data Sources: Atlas supports a variety of data sources, including Hadoop, Hive, Kafka, and more. Configure the necessary connectors and integrations to connect Atlas to your data ecosystem.

  3. Define Data Entities and Relationships: Use the Atlas web UI or the REST API to define the data entities and relationships in your organization's data landscape. This includes creating business glossaries, data lineage, and data policies.

  4. Manage Data Governance Policies: Leverage Atlas's policy management features to define and enforce data governance policies, such as data classification, access control, and data retention.

  5. Utilize Atlas's Metadata Search and Discovery: Take advantage of Atlas's search and discovery capabilities to find and understand the data assets across your organization.

  6. Monitor and Audit Data Lineage: Use Atlas's data lineage and impact analysis features to track the flow of data and understand the relationships between different data assets.

  7. Integrate Atlas with Other Tools: Explore the various integration options available for Atlas, such as connecting it with data processing frameworks, data catalogs, and business intelligence tools.

By following these steps, you can start leveraging the power of Apache Atlas to improve data governance, metadata management, and data discovery within your organization.

Competitor Comparisons

9,641

The Metadata Platform for your Data Stack

Pros of DataHub

  • More modern architecture with a focus on scalability and extensibility
  • Richer UI and user experience, including advanced search capabilities
  • Better support for cloud-native environments and microservices

Cons of DataHub

  • Younger project with potentially less stability compared to Atlas
  • Smaller community and ecosystem of integrations
  • Steeper learning curve due to more complex architecture

Code Comparison

Atlas (Java):

public class AtlasEntity extends Referenceable {
    public static final String TYPE_NAME = "AtlasEntity";
    private Map<String, Object> attributes;
}

DataHub (Python):

class DatasetSnapshot(Snapshot):
    """Snapshot class for datasets"""
    def __init__(self, urn: str, aspects: List[Union[DatasetProperties, SchemaMetadata, ...]] = None):
        super().__init__(urn, aspects)

Both projects use different programming languages for their core implementations. Atlas primarily uses Java, while DataHub uses a combination of Python, Java, and TypeScript. The code snippets demonstrate the different approaches to defining metadata entities in each project.

Atlas follows a more traditional Java object-oriented approach, while DataHub utilizes Python's type hinting and modern language features. This reflects DataHub's more recent development and focus on developer productivity.

9,636

The Metadata Platform for your Data Stack

Pros of DataHub

  • More modern architecture with a focus on scalability and extensibility
  • Richer UI and user experience, including advanced search capabilities
  • Better support for cloud-native environments and microservices

Cons of DataHub

  • Younger project with potentially less stability compared to Atlas
  • Smaller community and ecosystem of integrations
  • Steeper learning curve due to more complex architecture

Code Comparison

Atlas (Java):

public class AtlasEntity extends Referenceable {
    public static final String TYPE_NAME = "AtlasEntity";
    private Map<String, Object> attributes;
}

DataHub (Python):

class DatasetSnapshot(Snapshot):
    """Snapshot class for datasets"""
    def __init__(self, urn: str, aspects: List[Union[DatasetProperties, SchemaMetadata, ...]] = None):
        super().__init__(urn, aspects)

Both projects use different programming languages for their core implementations. Atlas primarily uses Java, while DataHub uses a combination of Python, Java, and TypeScript. The code snippets demonstrate the different approaches to defining metadata entities in each project.

Atlas follows a more traditional Java object-oriented approach, while DataHub utilizes Python's type hinting and modern language features. This reflects DataHub's more recent development and focus on developer productivity.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Pros of Amundsen

  • More user-friendly interface and search functionality
  • Better integration with modern data stack tools (e.g., Airflow, dbt)
  • Faster setup and deployment process

Cons of Amundsen

  • Less comprehensive metadata management capabilities
  • Smaller community and ecosystem compared to Atlas
  • Limited support for complex data lineage scenarios

Code Comparison

Amundsen (Python):

class TableMetadata(BaseModel):
    database: str
    cluster: str
    schema: str
    name: str
    description: Optional[str] = None
    tags: List[str] = []

Atlas (Java):

public class AtlasEntity extends AtlasStruct implements Serializable {
    private String guid;
    private String typeName;
    private String status;
    private String createdBy;
    private String updatedBy;
}

Both projects aim to provide data discovery and metadata management solutions, but they differ in their approach and focus. Amundsen emphasizes ease of use and modern integrations, while Atlas offers more comprehensive metadata management capabilities. The code snippets showcase the different languages and data modeling approaches used by each project.

An Open Standard for lineage metadata collection

Pros of OpenLineage

  • Lightweight and focused specifically on data lineage
  • Easier integration with modern data stack tools
  • More active community and frequent updates

Cons of OpenLineage

  • Less comprehensive metadata management features
  • Smaller ecosystem of integrations compared to Atlas
  • Limited governance and security capabilities

Code Comparison

Atlas (Java):

AtlasEntity entity = new AtlasEntity("hive_table", "employees");
entity.setAttribute("name", "employees");
entity.setAttribute("owner", "hr_department");
atlasClient.createEntity(entity);

OpenLineage (Python):

from openlineage.client import OpenLineageClient

client = OpenLineageClient()
client.emit(
    run_id="job123",
    job_name="process_employees",
    inputs=[{"namespace": "hive", "name": "employees"}],
    outputs=[{"namespace": "hive", "name": "processed_employees"}]
)

Summary

OpenLineage is a more modern, lightweight solution focused on data lineage, while Atlas offers a more comprehensive metadata management platform. OpenLineage is easier to integrate with contemporary data tools but has fewer features for governance and security. Atlas provides a broader range of functionalities but may be more complex to set up and maintain.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Licensed to the Apache Software Foundation (ASF) under one

or more contributor license agreements. See the NOTICE file

distributed with this work for additional information

regarding copyright ownership. The ASF licenses this file

to you under the Apache License, Version 2.0 (the

"License"); you may not use this file except in compliance

with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

Apache Atlas Overview

Apache Atlas framework is an extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem.

This will provide true visibility in Hadoop by using both a prescriptive and forensic model, along with technical and operational audit as well as lineage enriched by business taxonomical metadata. It also enables any metadata consumer to work inter-operably without discrete interfaces to each other -- the metadata store is common.

The metadata veracity is maintained by leveraging Apache Ranger to prevent non-authorized access paths to data at runtime. Security is both role based (RBAC) and attribute based (ABAC).

Build Process

  1. Get Atlas sources to your local directory, for example with following commands $ cd $ git clone https://github.com/apache/atlas.git $ cd atlas

    Checkout the branch or tag you would like to build

    to checkout a branch

    $ git checkout

    to checkout a tag

    $ git checkout tags/

  2. Execute the following commands to build Apache Atlas

    $ export MAVEN_OPTS="-Xms2g -Xmx2g" $ mvn clean install $ mvn clean package -Pdist

  3. After above build commands successfully complete, you should see the following files

    distro/target/apache-atlas--bin.tar.gz distro/target/apache-atlas--hbase-hook.tar.gz distro/target/apache-atlas--hive-hook.tar.gz distro/target/apache-atlas--impala-hook.tar.gz distro/target/apache-atlas--kafka-hook.tar.gz distro/target/apache-atlas--server.tar.gz distro/target/apache-atlas--sources.tar.gz distro/target/apache-atlas--sqoop-hook.tar.gz distro/target/apache-atlas--storm-hook.tar.gz distro/target/apache-atlas--falcon-hook.tar.gz

  4. For more details on installing and running Apache Atlas, please refer to https://atlas.apache.org/#/Installation