In this note, I will provide another script to join two files. The files are foo and bar. They contain lines whereby the elements are separated by a bar (|). One example of such line is 1|110. So the first step is to split the lines. Then the file is indexed on one element. Subsequently, it can be joined.
As a final step, the results are shown.
The code looks like:
var fooTable = sc.textFile("/user/prut/foo") var barTable = sc.textFile("/user/prut/bar") var fooSplit = fooTable.map(line => line.split('|')) var fooKeyed = fooSplit.keyBy(cells => cells(0)) var barSplit = barTable.map(line => line.split('|')) var barKeyed = barSplit.keyBy(cells => cells(0)) var joinValues = fooKeyed.join(barKeyed) joinValues.take(10)
The result looks like
res0: Array[(String, (Array[String], Array[String]))] = Array((2,(Array(2, 108),Array(2, 98))), (3,(Array(3, 104),Array(3, 101))), (1,(Array(1, 110),Array(1, 96))))