Avro in Java

Another example shows a similar idea. In this example a stream is created. This stream consists of 3 objects that contain a name and a number. Once the stream is created, it is serialised. In other words: the stream is prepared to be stored. It is stored in a file that is called „test.avro“.
Before continuing, one remark on serialisation.
The idea on serialisation is that one creates a format that is understandable outside the original language. An object that is created in Java can only be handled inside Java. To communicate the content, one needs to use a format that is underable by other languages, such as a string or an integer. The translation from an object into strings/integers is called serialisation. One then creates something that is understood outside Java. In this case, everyting will be translated into strings and integers. These are wrtten to a file. They can be understood by, say, PHP or Oracle. The strings and integers are written to a file. That file can be read by Oracle or PHP as they will only encounter strings/integers that can be transmitted from say Java to Oracle/ PHP.
In that file, we may detect the scheme along which the data are stored and the actual data. It takes a bit of courage as it is a binary file. Subsequently, it will be read from that file and the contents is shown.
The programme is written in Java. It reads like:

package avro;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.util.Utf8;

class EmployeeTom
	public static Schema SCHEMA;
	static {
		try {
			SCHEMA = Schema.parse(EmployeeTom.class.getResourceAsStream("EmployeeTom.avsc"));
		catch (IOException e)
			System.out.println("Couldn't load a schema: "+e.getMessage());
	private String name;
	private int age;

	public EmployeeTom(String name, int age){
		this.name = name;
		this.age = age;
	public GenericData.Record serialize() {
		  GenericData.Record record = new GenericData.Record(SCHEMA);
		  record.put("name", this.name);
		  record.put("age", this.age);
		  return record;
	public static void testWrite(File file, EmployeeTom[] people) throws IOException {
		   GenericDatumWriter datum = new GenericDatumWriter(EmployeeTom.SCHEMA);
		   DataFileWriter writer = new DataFileWriter(datum);
		   writer.create(EmployeeTom.SCHEMA, file);
		   for (EmployeeTom p : people)

	public static void testRead(File file) throws IOException {
		GenericDatumReader datum = new GenericDatumReader();
		DataFileReader reader = new DataFileReader(file, datum);
		GenericData.Record record = new GenericData.Record(reader.getSchema());
		while (reader.hasNext()) {
			System.out.println("Name " + record.get("name") + 
			                    " Age " + record.get("age") );
	public static void main(String[] args) {
		EmployeeTom e1 = new EmployeeTom("Joe",31);
		EmployeeTom e2 = new EmployeeTom("Jane",30);
		EmployeeTom e3 = new EmployeeTom("Zoe",21);
		EmployeeTom[] all = new EmployeeTom[] {e1,e2,e3};

		File bf = new File("test.avro");
		try {
		catch (IOException e) {
			System.out.println("Main: "+e.getMessage());			

A final remark. I stored the schema in the same directory as the class files. This allowed the class EmployeeTom to find the schema file. The schema looked like:

  "type": "record", 
  "name": "Employee", 
  "fields": [
      {"name": "name", "type": "string"},
      {"name": "age", "type": "int"}