In Hive, we see a situation where a table definition is stored in a metastore. This table definition is linked to a directory where the data are stored. It is possible to use different formats here. One may think of a text format. But other formats are possible too. One example is the avro format. This format can be characterised by a combination of file definition and the actual data.
Let us see how these data can be imported.
A possibility is to use the scoop command to import data in avro format:

sqoop import \
--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))" \
--username scott --password binvegni \
--table fam \
--columns "NUMMER, NAAM" \
--m 1 \
--target-dir /loudacre/fam_avro \
--null-non-string '\\N' \
--as-avrodatafile

The result is composed of two parts:
1: The directory where the table content is stored. This can be found on HDFS in directory in /loudacre/fam_avro. This can be verified with “hadoop dfs -ls /loudacre/fam_avro”. This provides an overview of the files where the data are stored. When this is opened, one sees:

So, we have the data. The data definition is stored in the same directory as the directory where the scoop command was launched. To get this on HDFS platform, we issue ” hadoop dfs -put sqoop_import_fam.avsc /loudacre/”. This stored the definition on HDFS level. This can be investigated:

If we have the data (in directory /loudacre/fam_avro) and the definition (in file /loudacre/sqoop_import_fam.avsc), we may proceed by a definition in hive. Then hive is started by: “beeline -u jdbc:hive2://192.168.2.60:10000 -n training -p training”, followed by:

CREATE EXTERNAL TABLE fam_avro
STORED AS AVRO
LOCATION '/loudacre/fam_avro'
TBLPROPERTIES ('avro.schema.url'=
'hdfs:/loudacre/sqoop_import_fam.avsc');

A check with “select * from fam_avro” showed the data as expected.

 

Door tom