Analysing JSON and database tables in Spark

In a previous note, I showed how CSV files can be analysed. One may use the same technique to analyse JSON files or tables in a database. First, analysing JSON files can be analysed with code that looks like:

val jsonRDD = sc.wholeTextFiles("/user/tom/baby_names.json").map(x => x._2)
val namesJson = sqlContext.read.json(jsonRDD)
namesJson.registerTempTable("names")
sqlContext.sql("select * from names").collect.foreach(println)

Going to a table in a database, requires an additional jar, that allows to access a database. in case of MySQL, we may use mysql-connector-java-5.0.8-bin.jar, that must be stored in a directory next to other jars. We can then fire spark with spark-shell –jars lib/mysql-connector-java-5.0.8-bin.jar.
The code looks like:

val df1 = sqlContext.read.format("jdbc").option("url","jdbc:mysql://van-maanen.com/wordpress").option("driver","com.mysql.jdbc.Driver").option("dbtable","wp_comments").option("user","tom").option("password","filpso").load()
df1.registerTempTable("names")
sqlContext.sql("select * from names").collect.foreach(println)

Door tom