Paul Geeleher et al., Bioinformatics Advance Access Publication March 23rd, 2009.
Fresh in the latest version of Bioinformatics Advance Access is a rather wonder short correspondence on BioconductorBuntu. The authors of this brief article have highlighted a rather important divide within the bioinformatics community; those who can use R and those who can’t.
To solve the issue of “hot” microarray data analysis for those fearful of scripting, the authors have implemented a whole Ubuntu distribution containing the requisite packages, software and servers for rapid deployment of a data analysis server. In addition to just providing R and some bioconductor packages the authors have also implemented a basic framework of authentication and ownership, and some core GUIs to streamline the process of uploading, analysis and reporting the content of DNA microarray studies. In contrast to earlier efforts such as AMDA (Genopolis, Italy) the authors have provided mechanisms for the handling of Affymetrix data, single and dual colour arrays.
The workflow appears to contain all core elements of data validation, QC and differential expression analysis and also provides a little content for both GSEA and KEGG type analyses.
This is in my humble opinion a wonderful piece of work. Certainly this is not a complete solution (what about Illumina or Agilent data in their more native file structures?) and the reporting is lacking outside of the most basic content – but it does deliver an elegant and functional system for the dirty and unwashed masses. The wrapping of the stack onto an Ubuntu “spin” is great – if as promised I can download an iso, burn a disk, boot, install and rock-and-roll then this really could stand to be a really useful tool sitting in the corner of many small labs.
I have some vague suspicions though that this approach is doomed to failure. The biologists who cannot use R and Bioconductor are the same people who will be terribly afraid of booting a linux workstation and installing something by themselves. These are the same people who will be least well prepared to diagnose the problems on the server, and who will need the most training and babysitting to get them to the stage where the software can be applied in a meaningful way! Not a detraction from the paper, while BioconductorBuntu is a very elegant solution, and promises to solve some of the problems, a bioinformatician, IT guy or statistician is really needed to get the biologist up-and-running. Thank goodness – our jobs are still safe for the time being
This is certainly a well-earned-paper-of-the-week. Congratulations Paul et al.,

EMAAS is another environment for handling and analysis of gene expression data. The authors have set about the development of a distributed e-support system for the management and analysis of microarray data; to provide access to complex methods and to apply (from a biologist’s POV) non-trivial technologies to handle large multi-variate datasets.

(a nice logo from a deprecated database, and absolutely nothing to do with this review …)







