I created a new Bio7 example which demonstrates how to classify an image with Bio7, ImageJ and R.
For the classification I used the “randomForest” R package and an image example of ImageJ so you can reproduce the example quite easily. I made the example script as easy as possible and trained the classifier with 64 trees by default (see literature below). Not shown in the video is the procedure to control the prediction of the trained classifier with test data. You can find a simple script in the repository, too, which uses a method of the powerful ‘caret’ package.
If you have some recommendations of how to effectively use a decent classifier for image classification (e.g., which classifier is well suited for images, which tuning parameters are useful in this context, which additional signatures could be used, etc.) I would be happy to hear.
Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research 15, 3133–3181 (2014).
Oshiro, T. M., Perez, P. S. & Baranauskas, J. A. in Machine Learning and Data Mining in Pattern Recognition (ed. Perner, P.) 154–168 (Springer Berlin Heidelberg, 2012).
2 thoughts on “Supervised Image Classification With Bio7, R And ImageJ”
This is quite excellent! Both in demonstrating the technique and the tools. Is there a way to scale it up? What if you need to classify 100,000 images?
well there are some options to do that. ImageJ for example has plenty of tools to load images and I already demonstrated how to load, edit and statistically analyze an image stack or an image folder without to run out of RAM. With the ImageJ2 API (important parts will be available in the next release) it will be also possible to load chunks of multidimensional data and then classify the chunks in R with a back transfer to ImageJ.
Another easy to use option for huge sized images is to use the raster library of R with a default training set as demonstrated in the video (made with ImageJ).
To speed up the classification process of a huge amount of images an Rserve cluster could also be created.
Several remote R connections could be established and then the image data could be sent for classification to the R instances.
I heard that the developer of Rserve mentioned that his team runs a cluster of 1000 Rserve instances without a problem.