Flume allows to directly tranfer messages into a file. It even allows such files to be stored on Hadoop. This opens a way to capture messages in a file that is stored on Hadoop, ready to be analysed. The example is a series of events from a log that are collected. The file is then transferred to another platform (say Hadoop) to be processed further.
I got flume working on a sandbox for Cloudera. It looks as if three related parameters must be provided for: one parameter refers to a flume.conf file (found in /etc/flume-ng/conf.empty/flume.conf); one parameter refers to the name of the agent. This can be found in the flume.conf file which happens to be sandbox in my case. Finally a third parameter refers to a conf directory which parameters are set via flume-env.sh.
flume-ng agent --conf-file /etc/flume-ng/conf.empty/flume.conf --name sandbox --conf /opt/examples/flume/conf
Another example is next statement, more or less similar to the command given above:
sudo flume-ng agent -c /etc/gphd/flume/conf -f /etc/gphd/flume/conf/flume.conf -n agent
This one uses a conf file that is stored on /etc/gphd/flume/conf/flume.conf. The name of the agent is “agent; the conf directory is /etc/gphd/flume/conf. From the conf file, we know that so-called netcat is set up, that listens to port 44444. We use this to start a terminal session that starts a stream on the local Linux platform:
[pivhdsne:~]$ nc localhost 44444 testing OK 1 OK 3 OK 4 OK
From the conf file, we know that these streams are stored on hdfs in directory /user/flume. If we look there, we see this file:
[pivhdsne:~]$ hadoop dfs -cat /user/flume/FlumeData.1442949290540 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. testing 1 3 4 [pivhdsne:~]$
We see a file that is created on hdfs that stores the streamed data from the Linux platform. This is a way to transfer files from Linux to hdfs. A final example is given below, with the same netcat listener:
cat test|nc localhost 44444
This creates a stream (via cat) that translates a file into a stream. The stream is sent to the netcat process that submits the stream to port 44444. The stream is then catched by flume and stored in hdfs files.