Hive – mapreduce extension

Door tom 19 oktober 2015

It is good to realise that Hive is built upon a mapreduce framework. The idea is that Hive is developed by facebook to facilitate analysis on Hadoop files. It is possible to use some kind of a SQL dialect in stead of a Python or a java programme to do your analysis. When a Hive command is run, one sees clearly the map reduce steps. See below.

[pivhdsne:~]$ hive -e "select count(*) fron drink;"
15/10/19 22:14:48 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
15/10/19 22:14:48 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
....
2015-10-19 22:15:34,169 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.14 sec
2015-10-19 22:15:35,241 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.14 sec
2015-10-19 22:15:36,320 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.14 sec
2015-10-19 22:15:37,461 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.47 sec
2015-10-19 22:15:38,502 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.47 sec
2015-10-19 22:15:39,559 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.47 sec
MapReduce Total cumulative CPU time: 5 seconds 470 msec
Ended Job = job_1445283750993_0001
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 5.47 sec   HDFS Read: 266 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 470 msec
OK
4

This also led to criticism upon Hive. It is stated that Hive is still limited by the bottlenecks within mapreduce. Therefore other parties, such as Cloudera developed Impala to circumvent such bottlenecks.
But Hive doesn’t stand still. With Hortonworks an improved version of Hive is developed that seems to provide far better performance. A name that is often mentioned is Tez: the name of the execution engine that is used within this context.

Door tom

nice to know

Breaking

Hive – mapreduce extension

Door tom

Gerelateerd bericht

Je miste

Flask and JSON

A webserver from the command line

Use the node.js server as restful app server

Reading a CSV file and translate into dataframe

Hive – mapreduce extension

Door tom

Gerelateerd bericht

Oracle Aggregate and analytic functions

Inserting a BLOB

Oracle numerical data

Je miste

Flask and JSON

A webserver from the command line

Use the node.js server as restful app server

Reading a CSV file and translate into dataframe