Author: tom
-
Calculate distances in Oracle
Oracle allows you to calculate a distance between two points. Such calculation is not trivial as one must take into account that distances are calculated over the globe and the points are indicated on a longitude – latitude base. If we calculate the distance between longitude, latitude =(0,0) and (1,1), one has about 156 kilometers.…
-
Transpose in Oracle
Transposing data means changing data from a row into a column. Starting from version 11, this is possible in Oracle as well. It is possible to translate some values that appear in rows into columns. Doing so, a new table can be created that has an additional set of columns with column names being derived…
-
Reading and writing in Java
Reading and writing from and to files is not easy in Java. This can already be seen if one simply googles on “Java Filereader problem”. This generated 477000 hits. Apparently, reading (and writing) is not trivial. I wrote a small program that is able to read a file and copy its contents to another file.…
-
Another set of keys and values
Another example on how mapper and reducers are used in a Hadoop context is given below. This programme is created as three classes. One class is an overall class that calls two other classes: a mapper class and a reducer. The mapper classer reads a file and creates a series of words. In the first…
-
Map and reduce – what happens?
In Big Data, the concept of mapping and reducing plays a huge role. The idea is that a a massive dataset is split over several servers. On each server, a part of the data is investigated. This part is called a mapper. In a subsequent part, these parts are merged into an outcome. This latter…
-
Hive – connecting from SQL Developer
In my impression, the big development that takes place now in the world of Big Data is the creation of connectors. Such connectors enable us to continue using standard tools (R for example) with the data being stored in Hadoop. I am very much impressed with Hive. Hive allows us to access data being stored…
-
R – the shortest name possible
For some reason, short names are popular as computer languages. Think of “C”. Another example is “R”. R reminds me a bit of Matlab; it is an easy to learn language with immense statistical possibilities. It is compared to nowadays giants like SAS. The advantage of R is that it is widely accepted by the…
-
Hive, SQL on Hadoop
In a previous post, I discussed the difficulty to use Hadoop with its Big Data structure. One must write two different Java programmes. One programme is a so-called mapping programme; another is the reduce programme.
-
Pig: yet another approach to handling big data
In another post, I discussed how Java can be used to analyse data in a Big Data environment. The problem then lies with Java itsself. Java is not a tool for the faint hearted; it is difficult. Moreover, one must comply with a structure where one must write two programme’s: a mapping programme and a…
-
Python: another language to access Big Data
In an earlier post, I showed how Java could be used to access Big Data. I also stated that I had many problems with Java itsself. I noted that I was not the only one to have issues with Java. A much easier language is Python. This language is really easy to learn and it…