A Groovy parser for CSV files

Parsing CSV these days is pretty straight-forward and not a big deal especially when we have the handy libraries from Apache Commons (I’m talking bout Java world). In this post I will give you an example how to use the Apache Commons CSV with the magic of Groovy and its closures so it can look and feel a little more fun because parsing in general is job for sad people (not kidding).

We’ll make ourself a simple Groovy class that will hold a reference to the CVSParser file, and a reference to the headers and the current record/line of the file that we will iterate with the closure delegate set to the instance of this CSVParserUtils class.

Something like this:

class CSVParseUtils {

    CSVParser csvFile
    def record
    def headers

    CSVParseUtils(String fileLocation) {
        def reader = Paths.get(fileLocation).newReader()
        CSVFormat format = CSVFormat.DEFAULT.withHeader().withDelimiter(delimiter)
        csvFile = new CSVParser(reader, format)
        def header = csvFile.headerMap.keySet().first()
        headers = header.split(delimiter as String)
    }

As we can see it’s a constructor that takes the location to the CSV file that we want to parse, creates some default parsing format and generates new CSVFile that holds the CSV data.

As we see parsing is easy, but it’s better when we can transform the data on the run as we loop it. For that reason we will define a method called eachLine that will take a params Map and a Closure that will have access to the record/line instance and will do something with it.

/**
 * List each line of the csv and execute closure
 * @param params
 * @param closure
 */
def eachLine(Map params = [:], Closure closure) {
    def max = params.max ?: maxLines
    int linesRead = 0
    def rowIterator = csvFile.iterator()
    closure.setDelegate(this)

    while (rowIterator.hasNext() && linesRead++ < max) {
        record = rowIterator.next()
        closure.call(record)
    }
}

It’s nothing special only a simple loop that iterates through the iterator and calls the closure with the given record for that line as a closure argument.

How to use it?

def parser = new CSVParseUtils(fileLocation)
def result = [:]
// first 2 lines without header
parser.eachLine([max: 2]) { CSVRecord record ->
    result.put(record.recordNumber, record.values.size() > 4 ? 
             record.values[0..4] : record.values[0..record.values.size()])
}

We imagine that we need only the first 2 lines and the first 5 columns or something like that.

As you can see this closure loop is not specially connected with CSV, it’s just a clean way to iterate through any textual file line by line and do something with it. As a matter of fact you can use the BufferedReader which has method eachLine too.

The source code for this whole example can be found on github.

Thanks for reading.

Passing by ‘reference’ in Java

One of the first things that every Java programmer learns (or should learn) is that method parameters in Java are passed-by-value. That is the only truth and there is no so called ‘reference’ passing in Java. Every method call with parameters means that their value is copied in some memory chunk and then they are passed (the copied memory) to the local function to be used.

What is more important though is of what type is the parameter that is passed. Generally there are two different data types: primitive (int, char, double etc…) and complex aka objects (Object, array etc…). The thing that matters is what is the ‘value’ of them when they are used in parameter passing.

When we are passing parameters of primitive type we are passing the actual value of it. So if we pass an integer with value of 4, then the function will receive an integer with value of 4 as parameter value. However, if we pass parameter of complex type let’s say some object of class Company then the function will actually receive the pointer to real object location in memory. Or in Java terms, it will receive the copied value of the reference (address) to the Java object that we want to pass and use.

If in C++ we have: Company *c; to get the pointer , then in Java we have Company c; . It’s pretty much the same, the difference is how things are designed and implemented under the cover.

If we understand this, then we realize that even if there are no out parameters in Java when defining and implementing methods, we can still use the reference advantage to program that thing by our self.

To get things clear and imagine the picture we should actually see the picture, I mean the code.

For example we can use an array of size one to be our data holder. Passing and mutating data this way will change the real value that we want to be changed. An example code, try it:


package com.groggystuff;

/**
*
* @author Igor
*/
public class JavaArguments {

/**
*
* @param argument
*/
public static void mutate(int argument){
argument++;
}

/**
*
* @param argument
*/
public static void mutate(int[] argument){
argument[0]++;
}

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// example pass by value
int i = 4;
System.out.println("Value of 'i' before mutation: "+i); //prints 4
mutate(i);
System.out.println("Value of 'i' after mutation: "+i); // prints 4

// example pass by value too
int[] j = new int[1];
j[0] = 4;
System.out.println("Value of 'j' before mutation: "+j[0]); // prints 4
mutate(j);
System.out.println("Value of 'j' after mutation: "+j[0]); // prints 5

}

}

Thanks for reading,

I hope this post will be helpful to you in solving the coding mysteries in life, or something similar.

Please comment if you feel that  your comment is needed. Or comment at will, just to say hi for example.