Sqoop and Hive
It is possible to use Sqoop to directly load from a RDBMS into Hive. This opens interesting possibilities. Data that are stored in a RDBMS and that need to be…
It is possible to use Sqoop to directly load from a RDBMS into Hive. This opens interesting possibilities. Data that are stored in a RDBMS and that need to be…
It is good to realise that Hive is built upon a mapreduce framework. The idea is that Hive is developed by facebook to facilitate analysis on Hadoop files. It is…
In this blog, I will discuss the word count problem as done with Python. It is often used to show how map reduce works. In most examples, it is developed…
In my view, the new development that we see now is building links to a Hadoop platform. One such development is building ODBC drivers that allow windows tools to access…
Recently, I revisited Pig. Pig is a language that allows you to analyse data sets in a Hadoop environment. It is created at Yahoo to circumvent the technicalities of creating…
A few days ago, I was asked to load some tables in Oracle. A rather trivial question but I wasn’t sure if enough tablespace was left. From the table definition,…
Sqoop is a tool that allows you to ship data from a RDBMS to a Hadoop platform. Let us take an example to clarify this. One may have some data…
Netcat is a utility in unix to investigate network connections. It has now been ported to windows and it allows us to query network connections on a windows platform with…
Flume allows to directly tranfer messages into a file. It even allows such files to be stored on Hadoop. This opens a way to capture messages in a file that…
I encountered the term “serialise”. But what does it mean? I understood the term “serialise” when I read a comment that explained that data structures can be created inside, say,…