Convert Figma logo to code with AI

Netflix logometacat

No description available

1,604
279
1,604
47

Top Related Projects

9,641

The Metadata Platform for your Data Stack

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Cartography is a Python tool that consolidates infrastructure assets and the relationships between them in an intuitive graph view powered by a Neo4j database.

9,636

The Metadata Platform for your Data Stack

1,809

Apache Atlas

Quick Overview

Metacat is an open-source metadata management and data discovery service developed by Netflix. It provides a unified view of metadata across various data sources, enabling data discovery, lineage, and governance capabilities for enterprises.

Pros

  • Unified Metadata Management: Metacat aggregates metadata from diverse data sources, providing a centralized view of an organization's data assets.
  • Data Discovery: The platform offers advanced search and browsing capabilities, making it easier for users to find and understand available data.
  • Data Lineage: Metacat tracks the lineage of data, allowing users to understand the origin and transformation of data.
  • Scalability: The system is designed to handle large-scale metadata management, supporting enterprises with growing data needs.

Cons

  • Complexity: Integrating Metacat with existing data infrastructure may require significant setup and configuration, which can be challenging for some organizations.
  • Limited Native Integrations: While Metacat supports a range of data sources, the list of native integrations may not cover all the data sources used by an organization.
  • Learning Curve: Users may need to invest time in understanding the Metacat platform and its features, which can be a barrier to adoption.
  • Dependency on External Components: Metacat relies on other components, such as Elasticsearch and Hive, which adds complexity to the overall system management.

Code Examples

N/A (Metacat is not a code library)

Getting Started

N/A (Metacat is not a code library)

Competitor Comparisons

9,641

The Metadata Platform for your Data Stack

Pros of DataHub

  • More comprehensive data catalog solution with features like data lineage, data quality, and data governance
  • Active community development with frequent updates and contributions
  • Supports a wider range of data sources and integrations

Cons of DataHub

  • More complex setup and configuration compared to Metacat
  • Steeper learning curve due to its extensive feature set
  • May require more resources to run and maintain

Code Comparison

DataHub (Python client example):

from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.com.linkedin.pegasus2avro.metadata.snapshot import DatasetSnapshot
from datahub.metadata.schema_classes import DatasetPropertiesClass

emitter = DatahubRestEmitter("http://localhost:8080")
dataset = DatasetSnapshot(urn="urn:li:dataset:(urn:li:dataPlatform:mysql,my_database.my_table,PROD)")
dataset.aspects.append(DatasetPropertiesClass(description="My dataset description"))
emitter.emit(dataset)

Metacat (Java API example):

import com.netflix.metacat.common.server.api.v1.MetacatV1;
import com.netflix.metacat.common.dto.TableDto;

MetacatV1 api = ...;
TableDto table = api.getTable("catalog", "database", "table");
table.setMetadata(ImmutableMap.of("description", "My table description"));
api.updateTable("catalog", "database", "table", table);

Both projects aim to provide metadata management solutions, but DataHub offers a more comprehensive platform with advanced features, while Metacat focuses on simpler metadata management for data warehouses and lakes.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Pros of Amundsen

  • More comprehensive data discovery and metadata management solution
  • Stronger focus on data lineage and relationships between data assets
  • Better suited for large-scale data ecosystems with diverse data sources

Cons of Amundsen

  • More complex setup and configuration compared to Metacat
  • Requires additional components (e.g., Neo4j, Elasticsearch) for full functionality
  • May have a steeper learning curve for users and administrators

Code Comparison

Amundsen (Python):

from databuilder.extractor.neo4j_extractor import Neo4jExtractor
from databuilder.job.job import DefaultJob
from databuilder.loader.file_system_neo4j_csv_loader import FsNeo4jCSVLoader

job_config = ConfigFactory.from_dict({
    'extractor.neo4j.graph_url': 'bolt://localhost:7687',
    'loader.filesystem_csv_neo4j.node_dir_path': '/tmp/nodes',
    'loader.filesystem_csv_neo4j.relationship_dir_path': '/tmp/relationships',
})

Metacat (Java):

import com.netflix.metacat.common.server.properties.Config;
import com.netflix.metacat.main.api.v1.MetacatV1;

public class MetacatExample {
    private final MetacatV1 api;
    private final Config config;

    public MetacatExample(MetacatV1 api, Config config) {
        this.api = api;
        this.config = config;
    }
}

Cartography is a Python tool that consolidates infrastructure assets and the relationships between them in an intuitive graph view powered by a Neo4j database.

Pros of Cartography

  • Focuses on security and infrastructure analysis, providing a more comprehensive view of cloud assets and their relationships
  • Offers visualization capabilities, making it easier to understand complex infrastructure setups
  • Supports multiple cloud providers (AWS, GCP, Azure) out of the box

Cons of Cartography

  • Less emphasis on metadata management and data discovery compared to Metacat
  • May require more setup and configuration for data-centric use cases
  • Smaller community and fewer integrations with data processing tools

Code Comparison

Cartography (Python):

from cartography.intel.aws import ec2
from cartography.intel.aws.ec2 import sync_ec2_instances

def sync(neo4j_session, boto3_session, regions, update_tag):
    ec2.sync_ec2_instances(neo4j_session, boto3_session, regions, update_tag)

Metacat (Java):

@Slf4j
@Singleton
public class MetacatThriftHiveClient extends HiveClientFactory {
    @Inject
    public MetacatThriftHiveClient(Config config, MetacatHMSHandler handler) {
        super(config, handler);
    }
}
9,636

The Metadata Platform for your Data Stack

Pros of DataHub

  • More comprehensive data catalog solution with features like data lineage, data quality, and data governance
  • Active community development with frequent updates and contributions
  • Supports a wider range of data sources and integrations

Cons of DataHub

  • More complex setup and configuration compared to Metacat
  • Steeper learning curve due to its extensive feature set
  • May require more resources to run and maintain

Code Comparison

DataHub (Python client example):

from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.com.linkedin.pegasus2avro.metadata.snapshot import DatasetSnapshot
from datahub.metadata.schema_classes import DatasetPropertiesClass

emitter = DatahubRestEmitter("http://localhost:8080")
dataset = DatasetSnapshot(urn="urn:li:dataset:(urn:li:dataPlatform:mysql,my_database.my_table,PROD)")
dataset.aspects.append(DatasetPropertiesClass(description="My dataset description"))
emitter.emit(dataset)

Metacat (Java API example):

import com.netflix.metacat.common.server.api.v1.MetacatV1;
import com.netflix.metacat.common.dto.TableDto;

MetacatV1 api = ...;
TableDto table = api.getTable("catalog", "database", "table");
table.setMetadata(ImmutableMap.of("description", "My table description"));
api.updateTable("catalog", "database", "table", table);

Both projects aim to provide metadata management solutions, but DataHub offers a more comprehensive platform with advanced features, while Metacat focuses on simpler metadata management for data warehouses and lakes.

1,809

Apache Atlas

Pros of Atlas

  • More comprehensive data governance and lineage capabilities
  • Stronger integration with Hadoop ecosystem components
  • Active Apache project with broader community support

Cons of Atlas

  • Steeper learning curve and more complex setup
  • Less focus on cloud-native environments compared to Metacat
  • Potentially heavier resource requirements for deployment

Code Comparison

Atlas (Java):

AtlasClient atlasClient = new AtlasClient(atlasUrls, new String[]{"admin", "admin"});
Referenceable db = new Referenceable("hive_db");
db.set("name", "default");
db.set("description", "Default Hive database");
atlasClient.createEntity(db);

Metacat (Java):

MetacatClient metacatClient = new MetacatClient(config);
DatabaseCreateRequestDto createDto = new DatabaseCreateRequestDto();
createDto.setDefinitionMetadata(ImmutableMap.of("owner", "team_data"));
metacatClient.createDatabase("hive", "default", createDto);

Both projects aim to provide metadata management solutions, but Atlas offers more extensive data governance features, while Metacat focuses on cloud-native environments and simplicity. Atlas integrates better with Hadoop ecosystems, whereas Metacat excels in multi-cloud deployments. The code snippets demonstrate the different approaches to creating database entities, with Atlas using a more detailed object model and Metacat opting for a simpler request-based approach.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Metacat

Download License Issues NetflixOSS Lifecycle

Introduction

Metacat is a unified metadata exploration API service. You can explore Hive, RDS, Teradata, Redshift, S3 and Cassandra. Metacat provides you information about what data you have, where it resides and how to process it. Metadata in the end is really data about the data. So the primary purpose of Metacat is to give a place to describe the data so that we could do more useful things with it.

Metacat focusses on solving these three problems:

  • Federate views of metadata systems.
  • Allow arbitrary metadata storage about data sets.
  • Metadata discovery

Documentation

TODO

Releases

Releases

Builds

Metacat builds are run on Travis CI here. Build Status

Getting Started

git clone git@github.com:Netflix/metacat.git
cd metacat
./gradlew clean build

Once the build is completed, the metacat WAR file is generated under metacat-war/build/libs directory. Metacat needs two basic configurations:

  • metacat.plugin.config.location: Path to the directory containing the catalog configuration. Please look at catalog samples used for functional testing.
  • metacat.usermetadata.config.location: Path to the configuration file containing the connection properties to store user metadata. Please look at this sample.

Running Locally

Take the build WAR in metacat-war/build/libs and deploy it to an existing Tomcat as ROOT.war.

The REST API can be accessed @ http://localhost:8080/mds/v1/catalog

Swagger API documentation can be accessed @ http://localhost:8080/swagger-ui/index.html

Docker Compose Example

Pre-requisite: Docker compose is installed

To start a self contained Metacat environment with some sample catalogs run the command below. This will start a docker compose cluster containing a Metacat container, a Hive Metastore Container, a Cassandra container and a PostgreSQL container.

./gradlew metacatPorts
  • metacatPorts - Prints out what exposed ports are mapped to the internal container ports. Look for the mapped port (MAPPED_PORT) to port 8080.

REST API can be accessed @ http://localhost:<MAPPED_PORT>/mds/v1/catalog

Swagger API documentation can be accessed @ http://localhost:<MAPPED_PORT>/swagger-ui/index.html

To stop the docker compose cluster:

./gradlew stopMetacatCluster