Create a Hive table – 3 ways

In this little note, I want to show three different ways to create a table on Hive. The first one starts with a file on HDFS that is available and we create a table upon this file. This table is defined as an external file that is exposed as a table.

The code to be executed in Hive is:

CREATE EXTERNAL TABLE fam
(nummer SMALLINT,
name STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/user/hdfs'

And we may say the content of the table from, say SQL Developer:

 

From the clause “LOCATION ‘/user/hdfs'”, we know that files that will deliver the data will be stored in directory /user/hdfs. If we want to add data to the table, we only have to copy files to that location. Let us start with two files:

We then copy an additional file to this environment ( hadoop dfs -put /tmp/fam3 /user/hdfs ). This shows:

 

When we then start the Hive environment: (  beeline -u jdbc:hive2://192.168.2.60:10000 -n training -p training ), we may issue the select command: select * from fam; . This provides the content of the fam table. One may see the content of the newly added fam3 table. This shows clearly how easy it is to add data to a Hadoop platform. The only thing we have to do is to add files to the correct environment. These files will be automatically recognised as part of a table.

An alternative to adding data to a table is to move data within the HDFS environment. Suppose we have a file on HDFS that contains data that should be added to the table. First, we start Hive:

beeline -u jdbc:hive2://192.168.2.60:10000 -n training -p training

I then moved data into the table:

LOAD DATA INPATH '/user/training/fam4' INTO TABLE fam;

I then saw the file being added to the data environment:

 

 

Another way is a direct import from, say Oracle, which creates on the fly a file on HDFS and which adds the table definition to Hive. The code is:

sqoop import \
--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))" \
--username scott --password binvegni \
--table fam \
--columns "NUMMER, NAAM" \
--m 1 \
--fields-terminated-by ',' \
--table device \
--hive-import

Which creates a file on hdfs:

and a table in Hive that can be seen from, say Excel:

A slightly different script is this. Load data from Oracle into a file on hdfs:

sqoop import \
--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))" \
--table COUNTRIES \
--columns "NAME, LOCAL"  \
--num-mappers 1 \
--compression-codec=snappy \
--username ANONYMOUS \
--password ine13758 \
--warehouse-dir=/user/hive/warehouse 

with a subsequent code in hive:

CREATE EXTERNAL TABLE COUNTRIES ( 
    NAME STRING, 
    LOCAL STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LOCATION '/user/hive/warehouse/COUNTRIES';

 

 

It is good to realise that we must undertake two actions:

1:An action to store the data (in a file, that is stored in a HDFS directory.

2: An action to create table definitions that are stored in a metastore.

 

A third manner is to use the HCatalog:

hcat -e "CREATE TABLE prut \
(id INT, company STRING) \
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' \
LOCATION '/user/prut'"

 

Door tom