July 2017 – Groggy Man Reads The Man

JavaScript on the Web has a lot of APIs to work with. Some of them are fully supported some are still in draft. One of the things I worked with lately is the FileSystemDirectoryReader Interface of the File and Directory Entries API. It defines only one method called readEntries that returns an array containing some number of the directory’s entries. This is draft proposed API and is not supposed to be fully browser compatible. However I’ve tested it almost on all the latest versions of browsers and it works fine on each one except for the Safari browser (I think ;)).

This post will focus on one example that shows that sometimes reading an API documentation can be a little tricky. So the example in the documentation shows a common use of this API where the source of the FileSystemEntry items that we read from are passed with a (drag and) drop event in the browser. The FileSystemEntry can be either a file or a directory. What we want to do is build a file system tree of the dropped item . If the dropped item is a directory then the item is actually FileSystemDirectoryEntry object than defines the createRender method that creates the FileSystemDirectoryReader object on which we will call the readEntries method.

The demo example can be tested in this fiddle. What I want you to do is to drop a directory that contains more than 100 files. If you do that you can notice that the readEntries method returns only the first 100 queued files. That is the main reason for writing this post. The description on the successCallback argument of the readEntries method is a little bit confusing, it says: “A function which is called when the directory’s contents have been retrieved. The function receives a single input parameter: an array of file system entry objects, each based on FileSystemEntry. Generally, they are either FileSystemFileEntry objects, which represent standard files, or FileSystemDirectoryEntry objects, which represent directories. If there are no files left, or you’ve already called readEntries() on this FileSystemDirectoryReader, the array is empty.”

In their example we can see the scanFiles method that reads the items and creates html elements:

function scanFiles(item, container) {
        var elem = document.createElement("li");
        elem.innerHTML = item.name;
        container.appendChild(elem);

        if (item.isDirectory) {
            var directoryReader = item.createReader();
            var directoryContainer = document.createElement("ol");
            container.appendChild(directoryContainer);

            directoryReader.readEntries(function (entries) {
                entries.forEach(function (entry) {
                    scanFiles(entry, directoryContainer);
                });
            });
        }
    }

It seems that the successCallback functions returns the entries partially in a packages of 0 to max 100 items. If we use this function we will never iterate more that 100 items in the given directory. What we need to do is to decipher this part: “If there are no files left, or you’ve already called readEntries() on this FileSystemDirectoryReader, the array is empty.”. Translated this into JavaScript code is:

function scanFiles(item, container) {
        var elem = document.createElement("li");
        elem.innerHTML = item.name;
        container.appendChild(elem);

        if (item.isDirectory) {
            var directoryReader = item.createReader();
            var directoryContainer = document.createElement("ol");
            container.appendChild(directoryContainer);

            var fnReadEntries = (function () {
                return function (entries) {
                    entries.forEach(function (entry) {
                        scanFiles(entry, directoryContainer);
                    });
                    if (entries.length > 0) {
                        directoryReader.readEntries(fnReadEntries);
                    }
                };
            })();

            directoryReader.readEntries(fnReadEntries);
        }
    }

The change that we need to apply is to check after the iteration of the entries if the length of the entries array is bigger than 0. Case that’s the truth then we should call the readEntries method again. If the entries size is zero, then the iteration is finished – all the file system items are iterated.

This fiddle has the improved version of the scan file method that will list all the files in the directory (more that 100), and won’t trick you.

Now finally we can conclude what the part “returns an array containing some number of the directory’s entries” meant. 🙂

Cheers

Parsing CSV these days is pretty straight-forward and not a big deal especially when we have the handy libraries from Apache Commons (I’m talking bout Java world). In this post I will give you an example how to use the Apache Commons CSV with the magic of Groovy and its closures so it can look and feel a little more fun because parsing in general is job for sad people (not kidding).

We’ll make ourself a simple Groovy class that will hold a reference to the CVSParser file, and a reference to the headers and the current record/line of the file that we will iterate with the closure delegate set to the instance of this CSVParserUtils class.

Something like this:

class CSVParseUtils {

    CSVParser csvFile
    def record
    def headers

    CSVParseUtils(String fileLocation) {
        def reader = Paths.get(fileLocation).newReader()
        CSVFormat format = CSVFormat.DEFAULT.withHeader().withDelimiter(delimiter)
        csvFile = new CSVParser(reader, format)
        def header = csvFile.headerMap.keySet().first()
        headers = header.split(delimiter as String)
    }

As we can see it’s a constructor that takes the location to the CSV file that we want to parse, creates some default parsing format and generates new CSVFile that holds the CSV data.

As we see parsing is easy, but it’s better when we can transform the data on the run as we loop it. For that reason we will define a method called eachLine that will take a params Map and a Closure that will have access to the record/line instance and will do something with it.

/**
 * List each line of the csv and execute closure
 * @param params
 * @param closure
 */
def eachLine(Map params = [:], Closure closure) {
    def max = params.max ?: maxLines
    int linesRead = 0
    def rowIterator = csvFile.iterator()
    closure.setDelegate(this)

    while (rowIterator.hasNext() && linesRead++ < max) {
        record = rowIterator.next()
        closure.call(record)
    }
}

It’s nothing special only a simple loop that iterates through the iterator and calls the closure with the given record for that line as a closure argument.

How to use it?

def parser = new CSVParseUtils(fileLocation)
def result = [:]
// first 2 lines without header
parser.eachLine([max: 2]) { CSVRecord record ->
    result.put(record.recordNumber, record.values.size() > 4 ? 
             record.values[0..4] : record.values[0..record.values.size()])
}

We imagine that we need only the first 2 lines and the first 5 columns or something like that.

As you can see this closure loop is not specially connected with CSV, it’s just a clean way to iterate through any textual file line by line and do something with it. As a matter of fact you can use the BufferedReader which has method eachLine too.

The source code for this whole example can be found on github.

Thanks for reading.

Month: July 2017

Deciphering an API documentation

A Groovy parser for CSV files