Uncategorized – Pagina 2 – Data, data en nog ns data

Flume: sending data via stream

tom 18 december 2016 0Reacties

It is possible to capture streaming data in HDFS files. A tool to do this is Flume. The idea is that we have 3 elements: sources that provide a stream,…

Uncategorized

Partitioned Table in Hive

tom 18 december 2016 0Reacties

It is possible to partition the tables in Hive. Remember the data are stored in files. So we expect the files to be partitioned. This is accomplished by a split…

Uncategorized

Manipulating Avro

tom 16 december 2016 0Reacties

Avro files are binary files that contain data and the description of the files. Thereby it is a very interesting file format. One may send this file to any application…

Uncategorized

Parquet format

tom 13 december 2016 0Reacties

As we know, we may store table definitions in the metastore. These table definitions then refer to a location where the data are stored. The format of the data might…

Uncategorized

Avro format

tom 13 december 2016 0Reacties

In Hive, we see a situation where a table definition is stored in a metastore. This table definition is linked to a directory where the data are stored. It is…

Uncategorized

Create a Hive table – 3 ways

tom 8 december 2016 0Reacties

In this little note, I want to show three different ways to create a table on Hive. The first one starts with a file on HDFS that is available and…

Uncategorized

Oracle ODI

tom 24 november 2016 0Reacties

The successor to OWB is the Oracle Data Integrator. This tool has more functionalities than OWB. Next to that, it has an interface that more or less steers the user…

Uncategorized

Only this weekend I downloaded a Docker package from https://docs.docker.com/docker-for-windows. This package allows you to run very small light weight containers on your server than act as components to perform…

Uncategorized

Putting a file on HDFS

tom 16 november 2016 0Reacties

Putting a file on HDFS is relatively easy. There are a few steps to take. Let us assume the file is on a linux system. The first step is to…

Uncategorized

Estimating with Python

tom 6 november 2016 0Reacties

It is relatively easy to do an estimate with a Python script. This is due to the fact that Python works with matrices and such matrices can be used as…

Breaking

Flume: sending data via stream

Partitioned Table in Hive

Manipulating Avro

Parquet format

Avro format

Create a Hive table – 3 ways

Oracle ODI

Docker container

Putting a file on HDFS

Estimating with Python

Je miste

Flask and JSON

A webserver from the command line

Use the node.js server as restful app server

Reading a CSV file and translate into dataframe