In the previous post, we used scala to merge two files. The interesting feature is that scala bypasses Mapreduce. Pig uses Mapreduce. If we undertake the same example, we will see a serious performance difference.
fooOriginal = LOAD '/user/prut/foo' USING PigStorage('|') AS (id :long, foo:long); barOriginal = LOAD '/user/prut/bar' USING PigStorage('|') AS (id :long, bar:long); joinedValues = JOIN fooOriginal by id, barOriginal by id; store joinedValues into '/user/pig' USING PigStorage('|');
The end result will be written to a HDFS file that is stored in the directory /user/pig.
Of course, the results are the same; on the other hand, we see a serious performance difference. Reason being that under water, mapreduce is used.