In a previous note, I showed how CSV files can be analysed. One may use the same technique to analyse JSON files or tables in a database. First, analysing JSON files can be analysed with code that looks like:
val jsonRDD = sc.wholeTextFiles("/user/tom/baby_names.json").map(x => x._2) val namesJson = sqlContext.read.json(jsonRDD) namesJson.registerTempTable("names") sqlContext.sql("select * from names").collect.foreach(println)
Going to a table in a database, requires an additional jar, that allows to access a database. in case of MySQL, we may use mysql-connector-java-5.0.8-bin.jar, that must be stored in a directory next to other jars. We can then fire spark with spark-shell –jars lib/mysql-connector-java-5.0.8-bin.jar.
The code looks like:
val df1 = sqlContext.read.format("jdbc").option("url","jdbc:mysql://van-maanen.com/wordpress").option("driver","com.mysql.jdbc.Driver").option("dbtable","wp_comments").option("user","tom").option("password","filpso").load() df1.registerTempTable("names") sqlContext.sql("select * from names").collect.foreach(println)