It is possible to use Sqoop to directly load from a RDBMS into Hive. This opens interesting possibilities. Data that are stored in a RDBMS and that need to be analysed on a cheaper platform, can be migrated via Sqoop to a Hadoop platform. Sqoop is generaly seen as a reliable medium to undertake such migration. The command is straigthforward:
sqoop import --connect jdbc:mysql://188.8.131.529/tom --table persons -m 1 --username thom --password thom24257 --hive-import
I noticed from the log that the import is done in three steps:
- In a first step, the data are imported from the RDMBS and the data are stored as HDFS datasets.
- The table is defined on Hive in the metadata store.
- In the third step, the data are moved from the HDFS platform to the Hive Warehouse.