As we know, we may store table definitions in the metastore. These table definitions then refer to a location where the data are stored. The format of the data might be an ordinary text file or it might be an avro file. Another possibility is a parquet file. This parquet format is an example of a packed/ zipped format.
To create such table is rather straightforward. First, we transfer a table to a parquet file on HDFS:
sqoop import \ --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.2.2)(port=1521))(connect_data=(service_name=orcl)))" \ --username scott --password binvegni \ --table fam \ --columns "NUMMER, NAAM" \ --m 1 \ --target-dir /loudacre/fam_parquet \ --as-parquetfile;
This results in a file that can be found in directory /loudacre/fam_parquet. For some reason, the file is called 5fe8fcaa-6095-40ec-b499-d73d6d971b6f.parquet. From Impala, we may then define the table with:
CREATE EXTERNAL TABLE fam_parquet LIKE PARQUET '/loudacre/fam_parquet/5fe8fcaa-6095-40ec-b499-d73d6d971b6f.parquet' STORED AS PARQUET LOCATION '/loudacre/fam_parquet/';