Analysing JSON and database tables in Spark
In a previous note, I showed how CSV files can be analysed. One may use the same technique to analyse JSON files or tables in a database. First, analysing JSON…
In a previous note, I showed how CSV files can be analysed. One may use the same technique to analyse JSON files or tables in a database. First, analysing JSON…
I found a beautiful YouTube movie that showed how Spark can be installed on windows. I found this on https://www.youtube.com/watch?v=WlE7RNdtfwE . The movie provided a clear guide how to this…
An interesting tango exists between Hive and Impala. The situation is as follows. Hive acts as an layer upon map reduce. It provides an interface whereby table definitions can be…
In an earlier post, I showed how one may send a stream via netcat to hdfs using flume. Another possibility is to set up a stream that is received by…
Before data modelling in Hadoop can be discussed, one needs to realise that Hadoop is about files. It is not about tables and relations – it is the files that…
HBase is a database system that is built on top of HDFS. However the term ‘database’ might be a bit misleading. It is not a traditional SQL database that can…
Below, I provide some Python code to write an AVRO file. An AVRO file consists of a scheme and a set of records. The records are written in binary format.…
Below, you will find a listing on how to copy the content of an Oracle table into an avro file. The trick is quite straight forward. A table is read…
It is possible to send an AVRO file via HTTP. The idea is that one sets up a server process. Once the server process runs, a client call is made.…
This note describes how we can show the content of an AVRO file with Python. We use python3 as tool here. We use this from an Anaconda framework. I checked…