A Groovy parser for CSV files

Parsing CSV these days is pretty straight-forward and not a big deal especially when we have the handy libraries from Apache Commons (I’m talking bout Java world). In this post I will give you an example how to use the Apache Commons CSV with the magic of Groovy and its closures so it can look and feel a little more fun because parsing in general is job for sad people (not kidding).

We’ll make ourself a simple Groovy class that will hold a reference to the CVSParser file, and a reference to the headers and the current record/line of the file that we will iterate with the closure delegate set to the instance of this CSVParserUtils class.

Something like this:

class CSVParseUtils {

    CSVParser csvFile
    def record
    def headers

    CSVParseUtils(String fileLocation) {
        def reader = Paths.get(fileLocation).newReader()
        CSVFormat format = CSVFormat.DEFAULT.withHeader().withDelimiter(delimiter)
        csvFile = new CSVParser(reader, format)
        def header = csvFile.headerMap.keySet().first()
        headers = header.split(delimiter as String)
    }

As we can see it’s a constructor that takes the location to the CSV file that we want to parse, creates some default parsing format and generates new CSVFile that holds the CSV data.

As we see parsing is easy, but it’s better when we can transform the data on the run as we loop it. For that reason we will define a method called eachLine that will take a params Map and a Closure that will have access to the record/line instance and will do something with it.

/**
 * List each line of the csv and execute closure
 * @param params
 * @param closure
 */
def eachLine(Map params = [:], Closure closure) {
    def max = params.max ?: maxLines
    int linesRead = 0
    def rowIterator = csvFile.iterator()
    closure.setDelegate(this)

    while (rowIterator.hasNext() && linesRead++ < max) {
        record = rowIterator.next()
        closure.call(record)
    }
}

It’s nothing special only a simple loop that iterates through the iterator and calls the closure with the given record for that line as a closure argument.

How to use it?

def parser = new CSVParseUtils(fileLocation)
def result = [:]
// first 2 lines without header
parser.eachLine([max: 2]) { CSVRecord record ->
    result.put(record.recordNumber, record.values.size() > 4 ? 
             record.values[0..4] : record.values[0..record.values.size()])
}

We imagine that we need only the first 2 lines and the first 5 columns or something like that.

As you can see this closure loop is not specially connected with CSV, it’s just a clean way to iterate through any textual file line by line and do something with it. As a matter of fact you can use the BufferedReader which has method eachLine too.

The source code for this whole example can be found on github.

Thanks for reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s