Category: Uncategorized
-
Hive, SQL on Hadoop
In a previous post, I discussed the difficulty to use Hadoop with its Big Data structure. One must write two different Java programmes. One programme is a so-called mapping programme; another is the reduce programme.
-
Pig: yet another approach to handling big data
In another post, I discussed how Java can be used to analyse data in a Big Data environment. The problem then lies with Java itsself. Java is not a tool for the faint hearted; it is difficult. Moreover, one must comply with a structure where one must write two programme’s: a mapping programme and a…
-
Python: another language to access Big Data
In an earlier post, I showed how Java could be used to access Big Data. I also stated that I had many problems with Java itsself. I noted that I was not the only one to have issues with Java. A much easier language is Python. This language is really easy to learn and it…
-
Hadoop: my first java programme
Today, I created a Java programme to get myself acquainted with the usage of Hadoop. I took an existing java programme to start with. This existing programme can be found at ” https://github.com/tomwhite/hadoop-book/blob/master/ch02/src/main/java/OldMaxTemperature.java “. I tweaked this programme to adjust it to my existing situation.
-
AWK to investigate files on Unix
Today, I worked with the Unix’ awk utility. This is an extremely potent utility to investigate text files on a Unix platform. It can be invoked from the terminal command line. The command must start with awk. The keyword awk is followed by a script that is positioned between quotes. After the quotes, the textfile…
-
Hadoop
Everyone talks about big data and Hadoop. Someone even compared it to teenage sex: everone talks about it, everyone knows someone who does it but no-one yet does it. I just tried hadoop to see what it is all about. I made two attempts to install hadoop. One attempt was about installing Hadoop 1.0.3. I…
-
Slowly Changing Dimensions Type 2
Just to get myself acquainted with the new Informatica version, I created a mapping in which SCD 2 was inplemented. The mapping is shown here. In the first step, the input data are read.Let us assume that these records are read. The records contain a number and a name: number Name 1 Tom 2 ine…