Sqoop and Hive

It is possible to use Sqoop to directly load from a RDBMS into Hive. This opens interesting possibilities. Data that are stored in a RDBMS and that need to be analysed on a cheaper platform, can be migrated via Sqoop to a Hadoop platform. Sqoop is generaly seen as a reliable medium to undertake such migration. The command is straigthforward:

sqoop import --connect jdbc:mysql://62.131.51.999/tom --table persons -m 1 --username thom --password thom24257  --hive-import

I noticed from the log that the import is done in three steps:

In a first step, the data are imported from the RDMBS and the data are stored as HDFS datasets.
The table is defined on Hive in the metadata store.
In the third step, the data are moved from the HDFS platform to the Hive Warehouse.

An alternative script is provided below:

sqoop import-all-tables \
    -m 1 \
    --connect jdbc:mysql://quickstart:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-parquetfile \
    --warehouse-dir=/user/hive/warehouse \
	--hive-overwrite

This alternative shows how the HDFS datasets can be stored in a compressed format. Here snappy is used. The format is a parquet file. An alternative would be an avro file. That would be achieved by a “–as-avrodatafile \” in stead of “–as-parquetfile \”.

Another alternative is importing from Oracle that can be done as:

sqoop import \
--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))" \
--table COUNTRIES \
--columns "NAME, LOCAL"  \
--num-mappers 1 \
--compression-codec=snappy \
--as-avrodatafile \
--username ANONYMOUS \
--password xxx99998 \
--warehouse-dir=/user/hive/warehouse \
--hive-overwrite

Once the data are in the Hive metadatastore, they can be accessed via a jar-based JDBC connector or an ODBC connector. For the JDBC connector, I used Cloudera_HiveJDBC4_2.5.4.1006 that contains the necessary jars. For ODBC, I used ClouderaHiveODBC64.msi. Doing so, I could see the data in, say Excel:

Breaking

Door tom

Gerelateerd bericht

Je miste

Flask and JSON

A webserver from the command line

Use the node.js server as restful app server

Reading a CSV file and translate into dataframe