Sqoop and Hive

It is possible to use Sqoop to directly load from a RDBMS into Hive. This opens interesting possibilities. Data that are stored in a RDBMS and that need to be analysed on a cheaper platform, can be migrated via Sqoop to a Hadoop platform. Sqoop is generaly seen as a reliable medium to undertake such migration. The command is straigthforward:

sqoop import --connect jdbc:mysql:// --table persons -m 1 --username thom --password thom24257  --hive-import

I noticed from the log that the import is done in three steps:

  • In a first step, the data are imported from the RDMBS and the data are stored as HDFS datasets.
  • The table is defined on Hive in the metadata store.
  • In the third step, the data are moved from the HDFS platform to the Hive Warehouse.