Scala merging files
In a previous post, I showed how two files can be merged in Scala. The idea was that RDDs were translated as data frames and a join was undertaken on…
In a previous post, I showed how two files can be merged in Scala. The idea was that RDDs were translated as data frames and a join was undertaken on…
RDDs are the basic unit in Scala on Spark. The abbreviation stands for Resilient Distributed Dataset, This shows that we are talking on full data sets that are stored persistently…
I understand that Scala may be used in an ETL context. In ETL, an important element is the merge of two files. We will get data from different sources and…
Scala can be used as a tool to manipulate big data. If it is used in the spark context, we have a possibility to combine two strong tools: spark with…
Scala is a language that is used for general purposes. One may use it as a statistical tool, a tool to undertake pattern matching etc. Just like any other programming…
I have a php script that runs as cgi on a webserver. The programme is quite simple. First is asks for a userid and password. The userid and password are…
In this note, I show yet another Pyspark with slightly different methods to filter. The idea is that file is read in a RDD. Subsequently, it is cleaned. That cleaning…
Pyspark is the python language that is applied to spark. It therefore allows a wonderful merge between spark with its possibilities to circumvent the limitation that are set by the…
I just discovered the 1000th wordcount example. It is based on Pyspark. The idea is actually quite simple. One creates a script. This script can be written in any editor.…
Pyspark allows us to process files in a big data/ Hadoop environment. I showed in another post how Pyspark can be started and how it can be used. The concept…