Flume

Flume allows to directly tranfer messages into a file. It even allows such files to be stored on Hadoop. This opens a way to capture messages in a file that is stored on Hadoop, ready to be analysed. The example is a series of events from a log that are collected. The file is then transferred to another platform (say Hadoop) to be processed further.
I got flume working on a sandbox for Cloudera. It looks as if three related parameters must be provided for: one parameter refers to a flume.conf file (found in /etc/flume-ng/conf.empty/flume.conf); one parameter refers to the name of the agent. This can be found in the flume.conf file which happens to be sandbox in my case. Finally a third parameter refers to a conf directory which parameters are set via flume-env.sh.

 flume-ng agent --conf-file /etc/flume-ng/conf.empty/flume.conf --name sandbox  --conf /opt/examples/flume/conf

Another example is next statement, more or less similar to the command given above:

sudo flume-ng agent -c /etc/gphd/flume/conf -f /etc/gphd/flume/conf/flume.conf -n agent

This one uses a conf file that is stored on /etc/gphd/flume/conf/flume.conf. The name of the agent is „agent; the conf directory is /etc/gphd/flume/conf. From the conf file, we know that so-called netcat is set up, that listens to port 44444. We use this to start a terminal session that starts a stream on the local Linux platform:

[pivhdsne:~]$ nc localhost 44444
testing
OK
1
OK
3
OK
4
OK

From the conf file, we know that these streams are stored on hdfs in directory /user/flume. If we look there, we see this file:

[pivhdsne:~]$ hadoop dfs -cat /user/flume/FlumeData.1442949290540
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

testing
1
3
4
[pivhdsne:~]$ 

We see a file that is created on hdfs that stores the streamed data from the Linux platform. This is a way to transfer files from Linux to hdfs. A final example is given below, with the same netcat listener:

cat test|nc localhost 44444

This creates a stream (via cat) that translates a file into a stream. The stream is sent to the netcat process that submits the stream to port 44444. The stream is then catched by flume and stored in hdfs files.