Install Spark on windows
I found a beautiful YouTube movie that showed how Spark can be installed on windows. I found this on https://www.youtube.com/watch?v=WlE7RNdtfwE . The movie provided a clear guide how to this…
I found a beautiful YouTube movie that showed how Spark can be installed on windows. I found this on https://www.youtube.com/watch?v=WlE7RNdtfwE . The movie provided a clear guide how to this…
I saw a small Scala programme that allows you to calculate subtotals. The idea is that a flat file is provided with a name and a subtotal. A given namen…
An interesting tango exists between Hive and Impala. The situation is as follows. Hive acts as an layer upon map reduce. It provides an interface whereby table definitions can be…
In the previous post, we used scala to merge two files. The interesting feature is that scala bypasses Mapreduce. Pig uses Mapreduce. If we undertake the same example, we will…
In this note, I will provide another script to join two files. The files are foo and bar. They contain lines whereby the elements are separated by a bar (|).…
In an earlier post, I showed how one may send a stream via netcat to hdfs using flume. Another possibility is to set up a stream that is received by…
Before data modelling in Hadoop can be discussed, one needs to realise that Hadoop is about files. It is not about tables and relations – it is the files that…
HBase is a database system that is built on top of HDFS. However the term ‘database’ might be a bit misleading. It is not a traditional SQL database that can…
Below, I provide some Python code to write an AVRO file. An AVRO file consists of a scheme and a set of records. The records are written in binary format.…
Below, you will find a listing on how to copy the content of an Oracle table into an avro file. The trick is quite straight forward. A table is read…