Another set of keys and values

Another example on how mapper and reducers are used in a Hadoop context is given below. This programme is created as three classes. One class is an overall class that calls two other classes: a mapper class and a reducer.
The mapper classer reads a file and creates a series of words. In the first reducer programme, the series is grouped where the frequency of each word is calculated. In the second reducer programme the results are combined. The programme reads as:

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.util.Date;


@SuppressWarnings("unused")
public class WordCount {

   public static class Map extends MapReduceBase implements Mapper {
     private final static IntWritable one = new IntWritable(1);
     private Text word = new Text();

     public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);
       File file = new File("/home/hduser/example-mapper.txt");
       if (!file.exists()) {
       	file.createNewFile();
       };
       FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
       BufferedWriter output1 = new BufferedWriter(fw);
       Date date = new Date();
       while (tokenizer.hasMoreTokens()) {
         word.set(tokenizer.nextToken());
         output.collect(word, one);
         output1.append("mappert is jaar " + date.toString() +">"+ word + "    " + one + "\n");
       };
       output1.close();
     }
   }

   public static class Reduce extends MapReduceBase implements Reducer {
     public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
   	   File file = new File("/home/hduser/example-reducer.txt");
       if (!file.exists()) {
          	file.createNewFile();
          };
       FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
       BufferedWriter output1 = new BufferedWriter(fw);
       int sum = 0;
       while (values.hasNext()) {
         sum += values.next().get();
         output1.write("mappert is gelezen waarde " + " key "+ key + " sum " + sum + "\n");
       }
       output.collect(key, new IntWritable(sum));
       output1.close();
     }
   }

   public static void main(String[] args) throws Exception {
     JobConf conf = new JobConf(WordCount.class);
     conf.setJobName("wordcount");

     conf.setOutputKeyClass(Text.class);
     conf.setOutputValueClass(IntWritable.class);

     conf.setMapperClass(Map.class);
     conf.setCombinerClass(Reduce.class);
     conf.setReducerClass(Reduce.class);

     conf.setInputFormat(TextInputFormat.class);
     conf.setOutputFormat(TextOutputFormat.class);

     FileInputFormat.setInputPaths(conf, new Path(args[0]));
     FileOutputFormat.setOutputPath(conf, new Path(args[1]));

     JobClient.runJob(conf);
   }
}

The input file looks like:

dit is van tom tom

In the mapper, this text is split into seperate words:

mappert is jaar Tue Jun 16 05:31:22 PDT 2015>dit    1
mappert is jaar Tue Jun 16 05:31:22 PDT 2015>is    1
mappert is jaar Tue Jun 16 05:31:22 PDT 2015>van    1
mappert is jaar Tue Jun 16 05:31:22 PDT 2015>tom    1
mappert is jaar Tue Jun 16 05:31:22 PDT 2015>tom    1

The keys are the seperate words (like “dit”, “is” etc.). The value is always one.
In the first reducer programme, the frequency is calculated:

mappert is gelezen waarde  key dit sum 1
mappert is gelezen waarde  key is sum 1
mappert is gelezen waarde  key tom sum 1
mappert is gelezen waarde  key tom sum 2
mappert is gelezen waarde  key van sum 1

One sees the calculation when the key “tom” is processed. In the first round the value is 1, which is the original value. In the second round, it is detected that “tom” was detected before and the value is incremented with “1”.
In the second reducer programme, the results are merged:

mappert is gelezen waarde  key dit sum 1
mappert is gelezen waarde  key is sum 1
mappert is gelezen waarde  key tom sum 2
mappert is gelezen waarde  key van sum 1

This can be found in the output that reads as:

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop dfs -cat /user/output84/part-00000DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

15/06/16 06:21:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
dit	1
is	1
tom	2
van	1

Breaking

Door tom

Gerelateerd bericht

Je miste

Flask and JSON

A webserver from the command line

Use the node.js server as restful app server

Reading a CSV file and translate into dataframe