Metamapper
  • Documentation
  • Discussion
  • Blog
  • Github

›Datastores

Metadata Management

  • Welcome
  • Getting Started
  • Schema Inspection
  • SSH Tunnels
  • Custom Properties
  • Annotations
  • Search

Datastores

  • Overview
  • AWS Athena
  • AWS Glue Data Catalog
  • Azure Synapse
  • Google BigQuery
  • Hive Metastore
  • MySQL
  • Oracle
  • PostgreSQL
  • Redshift
  • Snowflake
  • Microsoft SQL Server

Workspace Management

  • Introduction
  • Access Management
  • Single Sign-On (SSO)
  • SSO Setup: Google
  • SSO Setup: Github
  • SSO Setup: SAML2

Installation (Self-Hosted)

  • Getting Started
  • Configuring Metamapper
  • Extensions
  • Asynchronous Workers
  • Email Configuration
  • File Storage
  • Security
  • Search
  • Healthchecks

Hive Metastore

Metamapper supports Apache Hive as a connection source. Hive is distributed data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.

Metamapper can sync Hive if it has an external metastore configured.

This guide details how to setup and troubleshoot Hive in Metamapper.

Prequisites

  • A running Hive metastore. Metamapper has been tested with Hive 2.0 and later when it is configured to use a remote metastore on either a MySQL, PostgreSQL, or Microsoft SQL Server database.

  • Ability to view database connection details. Metamapper specifically needs the following information:

    • IP or host
    • Username
    • Password
    • Port
    • Database
  • Database privileges that allow you to create users and grant privileges. This is required to create a database user for Metamapper.

Choose connection method

There are two ways to connect your database.

Direct Connection

A direct connection will work if Metamapper is in the same private network as your database. A direction connection will also work if your database is publicly accessible.

SSH Tunnel

A jump host or SSH tunnel is required if your database is in a separate private network. This method uses a publicly accessible server to act as an intermediary between Metamapper and your database.

Check out this guide on setting up a SSH tunnel to work with Metamapper.

Create user

You will need to provide a user that Metamapper will use to periodically scan your external Hive metastore. We recommend creating a new user so that you can easily isolate Metamapper from the rest of your infrastructure.

MySQL

CREATE USER metamapper IDENTIFIED BY '<redacted>';

PostgreSQL

CREATE USER metamapper WITH PASSWORD '<redacted>';

SQL Server

CREATE LOGIN metamapper WITH PASSWORD = '<redacted>';
CREATE USER metamapper FOR LOGIN metamapper;

Grant read access to user

Metamapper needs permission to read the metastore tables that are created when Hive is installed. We assume that certain standard installation processes were followed when setting up your Hive cluster.

MySQL

For MySQL connections, we assume that the metastore was created on the same database that was included in the connection settings.

GRANT SELECT, PROCESS ON <database>.* TO 'metamapper';
FLUSH PRIVILEGES;

PostgreSQL

For PostgreSQL connections, we assume that the metastore was created on the public schema. Hive forces Postgres to use uppercase tables by using quotes in it's DDL statements.

If for some reason, your installation does not meet the above requirements, please file a Github issue and we will assist as soon as possible.

GRANT USAGE ON SCHEMA "<schema>" TO USER metamapper;
GRANT SELECT ON ALL TABLES IN SCHEMA "<schema>" TO USER metamapper;

SQL Server

For Microsoft SQL Server connections, we assume that the metastore was created on the same database that was included in the connection settings.

GRANT SELECT ON SCHEMA :: [public] TO metamapper

Questions?

Metamapper is an opensource project. If you have a question that isn't answered here, check out the discussion forum.

You can also help improve this documentation by making a pull request.

Last updated on 9/9/2020
← Google BigQueryMySQL →
  • Prequisites
  • Choose connection method
    • Direct Connection
    • SSH Tunnel
  • Create user
  • Grant read access to user
  • Questions?
Metamapper
Documentation
User GuideInstallation Guide
Community
DiscussionGitHub
Copyright © 2020 Scott Cruwys