Archive for April, 2009

Summer is coming …

Monday, April 13th, 2009

img_4579.jpg

It has been Easter weekend, and as a surprise in Finland we have even had some sunshine and temperatures in double positive figures! When presented with both sun and a little warmth then the bicycle must come out.

Many of the best things in life are associated with Italy; good food, good wine, great motorcycles and fantastic racing bicycles. (It’s just a shame that the Italians can’t make cars – just look at Fiat …) My own Italian bike is now out and we went for the first dash of the season – there is a great round tour of the island where I live and I had fun, some pain, aching lungs, tortured legs, and a rather pathetic time, but we are now into cycle season.

I sometimes wonder what other bioinformaticians do for escape? Bioinformaticians I very much hope are not the archetypal geeks – pale skin unexposed to the sun – flabby waists and oily skin … Yikes … I am a fan of the warm outdoors and like sailing, biking and hiking. I am now in summer mood and things feel really positive.

I have been clearly trees and bushes in the garden, have tidied parts of the estate and will soon start preparing the boat for the sailing season. Life feels great at the moment – I guess that chocolate really does help.

Bugblatter – a bug tracking software for bioinformaticians

Wednesday, April 8th, 2009

rbbbot.jpg

I have spent rather too many hours over the last couple of days looking at and reviewing software for tracking ideas, plans, bugs and assigning meaning to what is supposed to fairly straightforward software development. I am looking for a simple piece of software that can run as a single user environment and can provide a list of projects, plans and bugs. Trac and Bugzilla are server side and heavy. Excel is awful and there doesn’t seem to be anything that I can run from a memory stick.

I have placed a software design brief with my contacts at Mnemosyne BioSciences and have asked for the development of a simple, OS agnostic solution that can run either as a single user from local files or can interact with a SVN server (or even as something more embedded). They have approved my design brief and have promised to develop a java tool for Windows and OSX that will provide bug tracking, reporting and management capabilities as a standalone tool. They have charged a pretty reasonable start-up fee for the project, but their understanding of the task is pretty much what I had envisioned from the start.

The name for their planned tool is “Mnemosyne Bugblatter”. Cool name, let’s see how the software looks when delivered? If anyone else could be interested in a simple portable tool for tracking projects, bugs and managing feature creep then please send a mail to bugblatter@mnemosyne.co.uk

fogbugz?

Wednesday, April 8th, 2009

fogbugz.png

FogBugz is a pretty clean and simple bug tracking software that is pretty well integrated in Eclipse. Not perfect since it is reliant upon some form of live connection, but this is a pretty good solution for something.

I have created an evaluation account with FogBugz and can create a collection of incidents, WIKIs and documentation. The code is hosted on their servers, and everything is presented through a rich web page GUI. There is more potential through the availability of a FogBugz GUI, and this could be used to solve many of my requirements of a platform independent off-line bug reporting and analysis environment.

One issue that is worth considering is that FogBugz is not free; it is a commercial solution costing $25 per user per month. For a small company such as Mnemosyne BioSciences this is acceptable, but for academic environments this becomes complicated…

This is certainly something that I intend to evaluate more fully, and I will report back on my experiences as I reach the end of the evaluation period and make the decision as to whether the FogBugz solution could provide an answer for the problem of tracking software and development and bugs within a bioinformatics data analysis environment project.

tracking bugs, managing plans and coping with feature creep

Wednesday, April 8th, 2009

nbss.png

I am a bioinformatician. My background is as a traditional geneticist, my PhD was in the fields of molecular biology (and a little phylogenetics and domain analysis). I only entered the domain of bioinformatics during my Post doctoral years when I worked as a genome annotator for the first green eukaryotic genome project. During this time I learned a lot of PERL, moved into Python and integrated a load of stuff to link my needs with a relational database and distributed jobs across a large cluster to run the typical InterPro / UniProt / nonred type tasks. GUI was never considered (or attempted) and code remained ad hoc until broken.

Now 10 years on from these heady days, I am now writing code in C, Java, R and a little Python. During my few years as an adjunct professor at a Finnish research centre I rewrote the whole software pipeline that I imagined in Java as a rather monolithic beast and have reimplemented the whole stack as something a little more abstracted and perhaps useful over the last few years as a distraction on the daily commute to the capital city. The whole software environment is now several hundred thousand lines of code, we have a rich GUI delivered over HTML and through Java WebStart and things are finally beginning to look how they should have when I first starting planning the project back in 2004.

Critical at this point is how is a self-taught informatician supposed to handle this code? I work alone, there is no code audit and no one works with me to validate, correct or comment on my code! I maintain a single code tree and this is at least within a code versioning system (subversion for bioinformatics is great …), so I am hopefully not completely inept at doing my work.

My question is how should I really manage the long list of non-specific issues, bugs and problems that I routinely encounter. A campaign to resolve an issue that has been introduced through feature creep can take hours if not days, and during this time I undoubtedly discover many more bugs and issues…

At the moment bugs are documented within a text file of problems, issues and events. I have a todo list, and this seems rather inefficient and trivial. I know that something like Bugzilla could work, but this seems rather more complex than is absolutely needed. I also work on a train, and therefore don’t have web access for much of the commute – a client side project that can be synced through SVN would be ideal. I also work on Linux, OSX and Windows, so ideally something that is cross-platform would be great…

This seems like a tall order, and something that there is no simple answer for. What bug tracking software do other bioinformaticians use?

The silence of the bioinformaticians

Wednesday, April 1st, 2009

I have been silent and rather occupied with a small project that is swallowing most of my free time; the development of a server::client interface for the analysis of Affymetrix GeneChips utilising the aroma.affymetrix package. Things are going pretty well; but I have introduced rather more complexity into the system by moving my logging routines from something home made into a more formal log4j schema.

At the moment I have a lovely struggle with the system – something in my code base appears to have some “deprecated” log4j code – and running the methods through Tomcat yields some rather vague and uninterpretable error messages.

java.lang.NoSuchFieldError: level

WTF? This problem appears to be pretty well documented across the WWW – but still resolving the problem down into a .jar file that should be upgraded, deleted or something is a little trying!

Alas – this is bioinformatics!

The problem has been solved, and bioinformatics was again the issue. Within my code I have the martj .jar file that provides some connectivity with the BioMart infrastructure. Within this martj.jar is a copy of log4j; and this version collides with the version that I was trying to use! What a great use of rather too many hours – but at least a load of code has now been refactored and the .jar dependencies are now a little cleaner!

BioconductorBuntu – A Linux Distribution that Implements a Web-Based DNA Microarray Analysis Server

Wednesday, April 1st, 2009

bioconductorbuntu.png

Paul Geeleher et al., Bioinformatics Advance Access Publication March 23rd, 2009.

Fresh in the latest version of Bioinformatics Advance Access is a rather wonder short correspondence on BioconductorBuntu. The authors of this brief article have highlighted a rather important divide within the bioinformatics community; those who can use R and those who can’t.

To solve the issue of “hot” microarray data analysis for those fearful of scripting, the authors have implemented a whole Ubuntu distribution containing the requisite packages, software and servers for rapid deployment of a data analysis server. In addition to just providing R and some bioconductor packages the authors have also implemented a basic framework of authentication and ownership, and some core GUIs to streamline the process of uploading, analysis and reporting the content of DNA microarray studies. In contrast to earlier efforts such as AMDA (Genopolis, Italy) the authors have provided mechanisms for the handling of Affymetrix data, single and dual colour arrays.

The workflow appears to contain all core elements of data validation, QC and differential expression analysis and also provides a little content for both GSEA and KEGG type analyses.

This is in my humble opinion a wonderful piece of work. Certainly this is not a complete solution (what about Illumina or Agilent data in their more native file structures?) and the reporting is lacking outside of the most basic content – but it does deliver an elegant and functional system for the dirty and unwashed masses. The wrapping of the stack onto an Ubuntu “spin” is great – if as promised I can download an iso, burn a disk, boot, install and rock-and-roll then this really could stand to be a really useful tool sitting in the corner of many small labs.

I have some vague suspicions though that this approach is doomed to failure. The biologists who cannot use R and Bioconductor are the same people who will be terribly afraid of booting a linux workstation and installing something by themselves. These are the same people who will be least well prepared to diagnose the problems on the server, and who will need the most training and babysitting to get them to the stage where the software can be applied in a meaningful way! Not a detraction from the paper, while BioconductorBuntu is a very elegant solution, and promises to solve some of the problems, a bioinformatician, IT guy or statistician is really needed to get the biologist up-and-running. Thank goodness – our jobs are still safe for the time being ;-)

This is certainly a well-earned-paper-of-the-week. Congratulations Paul et al.,