Archive for the ‘best working practices’ Category

Bugblatter – a bug tracking software for bioinformaticians

Wednesday, April 8th, 2009

rbbbot.jpg

I have spent rather too many hours over the last couple of days looking at and reviewing software for tracking ideas, plans, bugs and assigning meaning to what is supposed to fairly straightforward software development. I am looking for a simple piece of software that can run as a single user environment and can provide a list of projects, plans and bugs. Trac and Bugzilla are server side and heavy. Excel is awful and there doesn’t seem to be anything that I can run from a memory stick.

I have placed a software design brief with my contacts at Mnemosyne BioSciences and have asked for the development of a simple, OS agnostic solution that can run either as a single user from local files or can interact with a SVN server (or even as something more embedded). They have approved my design brief and have promised to develop a java tool for Windows and OSX that will provide bug tracking, reporting and management capabilities as a standalone tool. They have charged a pretty reasonable start-up fee for the project, but their understanding of the task is pretty much what I had envisioned from the start.

The name for their planned tool is “Mnemosyne Bugblatter”. Cool name, let’s see how the software looks when delivered? If anyone else could be interested in a simple portable tool for tracking projects, bugs and managing feature creep then please send a mail to bugblatter@mnemosyne.co.uk

fogbugz?

Wednesday, April 8th, 2009

fogbugz.png

FogBugz is a pretty clean and simple bug tracking software that is pretty well integrated in Eclipse. Not perfect since it is reliant upon some form of live connection, but this is a pretty good solution for something.

I have created an evaluation account with FogBugz and can create a collection of incidents, WIKIs and documentation. The code is hosted on their servers, and everything is presented through a rich web page GUI. There is more potential through the availability of a FogBugz GUI, and this could be used to solve many of my requirements of a platform independent off-line bug reporting and analysis environment.

One issue that is worth considering is that FogBugz is not free; it is a commercial solution costing $25 per user per month. For a small company such as Mnemosyne BioSciences this is acceptable, but for academic environments this becomes complicated…

This is certainly something that I intend to evaluate more fully, and I will report back on my experiences as I reach the end of the evaluation period and make the decision as to whether the FogBugz solution could provide an answer for the problem of tracking software and development and bugs within a bioinformatics data analysis environment project.

tracking bugs, managing plans and coping with feature creep

Wednesday, April 8th, 2009

nbss.png

I am a bioinformatician. My background is as a traditional geneticist, my PhD was in the fields of molecular biology (and a little phylogenetics and domain analysis). I only entered the domain of bioinformatics during my Post doctoral years when I worked as a genome annotator for the first green eukaryotic genome project. During this time I learned a lot of PERL, moved into Python and integrated a load of stuff to link my needs with a relational database and distributed jobs across a large cluster to run the typical InterPro / UniProt / nonred type tasks. GUI was never considered (or attempted) and code remained ad hoc until broken.

Now 10 years on from these heady days, I am now writing code in C, Java, R and a little Python. During my few years as an adjunct professor at a Finnish research centre I rewrote the whole software pipeline that I imagined in Java as a rather monolithic beast and have reimplemented the whole stack as something a little more abstracted and perhaps useful over the last few years as a distraction on the daily commute to the capital city. The whole software environment is now several hundred thousand lines of code, we have a rich GUI delivered over HTML and through Java WebStart and things are finally beginning to look how they should have when I first starting planning the project back in 2004.

Critical at this point is how is a self-taught informatician supposed to handle this code? I work alone, there is no code audit and no one works with me to validate, correct or comment on my code! I maintain a single code tree and this is at least within a code versioning system (subversion for bioinformatics is great …), so I am hopefully not completely inept at doing my work.

My question is how should I really manage the long list of non-specific issues, bugs and problems that I routinely encounter. A campaign to resolve an issue that has been introduced through feature creep can take hours if not days, and during this time I undoubtedly discover many more bugs and issues…

At the moment bugs are documented within a text file of problems, issues and events. I have a todo list, and this seems rather inefficient and trivial. I know that something like Bugzilla could work, but this seems rather more complex than is absolutely needed. I also work on a train, and therefore don’t have web access for much of the commute – a client side project that can be synced through SVN would be ideal. I also work on Linux, OSX and Windows, so ideally something that is cross-platform would be great…

This seems like a tall order, and something that there is no simple answer for. What bug tracking software do other bioinformaticians use?

BioconductorBuntu – A Linux Distribution that Implements a Web-Based DNA Microarray Analysis Server

Wednesday, April 1st, 2009

bioconductorbuntu.png

Paul Geeleher et al., Bioinformatics Advance Access Publication March 23rd, 2009.

Fresh in the latest version of Bioinformatics Advance Access is a rather wonder short correspondence on BioconductorBuntu. The authors of this brief article have highlighted a rather important divide within the bioinformatics community; those who can use R and those who can’t.

To solve the issue of “hot” microarray data analysis for those fearful of scripting, the authors have implemented a whole Ubuntu distribution containing the requisite packages, software and servers for rapid deployment of a data analysis server. In addition to just providing R and some bioconductor packages the authors have also implemented a basic framework of authentication and ownership, and some core GUIs to streamline the process of uploading, analysis and reporting the content of DNA microarray studies. In contrast to earlier efforts such as AMDA (Genopolis, Italy) the authors have provided mechanisms for the handling of Affymetrix data, single and dual colour arrays.

The workflow appears to contain all core elements of data validation, QC and differential expression analysis and also provides a little content for both GSEA and KEGG type analyses.

This is in my humble opinion a wonderful piece of work. Certainly this is not a complete solution (what about Illumina or Agilent data in their more native file structures?) and the reporting is lacking outside of the most basic content – but it does deliver an elegant and functional system for the dirty and unwashed masses. The wrapping of the stack onto an Ubuntu “spin” is great – if as promised I can download an iso, burn a disk, boot, install and rock-and-roll then this really could stand to be a really useful tool sitting in the corner of many small labs.

I have some vague suspicions though that this approach is doomed to failure. The biologists who cannot use R and Bioconductor are the same people who will be terribly afraid of booting a linux workstation and installing something by themselves. These are the same people who will be least well prepared to diagnose the problems on the server, and who will need the most training and babysitting to get them to the stage where the software can be applied in a meaningful way! Not a detraction from the paper, while BioconductorBuntu is a very elegant solution, and promises to solve some of the problems, a bioinformatician, IT guy or statistician is really needed to get the biologist up-and-running. Thank goodness – our jobs are still safe for the time being ;-)

This is certainly a well-earned-paper-of-the-week. Congratulations Paul et al.,

EMAAS: An extensible grid-based Rich Internet Application for microarray data analysis and management

Monday, March 23rd, 2009

G Barton et al., BMC Bioinformatics 2008, 9:493doi:10.1186/1471-2105-9-493

emaas.jpgEMAAS is another environment for handling and analysis of gene expression data. The authors have set about the development of a distributed e-support system for the management and analysis of microarray data; to provide access to complex methods and to apply (from a biologist’s POV) non-trivial technologies to handle large multi-variate datasets.

Whilst other solutions have missed the point and taken an easy approach to solving the problem, the EMAAS approach is rather more complicated and relies instead on integration of internet accessible tools, standard statistical packages (R/Bioconductor) and web-resources (CELSIUS, GEO). The decision to aim for a modular and flexible framework is excellent and makes this in my opinion a very much more interesting project. The completeness with which tools and environments has been included is breathtaking; the depth of IT and analytical platforms required is rather daunting.

In contrast to the manuscript reviewed in the last post, this resource’s source is available under a suitable GPL license, and some of the demo server also works. I have some problems with the resource (Flash for a start), but this is one smooth implementation and is packaged in such a way that I could take it for a spin if I so wished!

This manuscript is heavy to read, but a damned fine resource is described underneath the technical fluff. This is a great resource and this earns a great recommendation from the bioinformaticsblog.

Bioinformatics, backups and disk disasters …

Thursday, March 5th, 2009

broken_mac.jpg

I guess that as a bioinformatician and as someone who works hard to stress a computer that failures should be part of the deal and something that we can deal with. I guess that hardware failure and software failure are part of the rich cycle of life? Within the last year I have had a completely failed RAID system (thanks LaCie, there went 1.5TB of disk space and several hundred GB of data that needed to be recovered), a Levovo laptop that now communicates not with projectors, batteries or disks and a failed disk on my wife’s ancient Vaio. Yesterday on the train the disk on my *new* MacBook Pro gave up the ghost, some C code was compiling (that Taxonomy project again ;-) ) and it just sort-of waited and nothing happened.

Last night I tried all possible routes of disk disaster recovery; I cannot mount the disk using target mode on other macs, DiskWarrior refuses to even look at it, and with some overseas travel coming up a week without a fit-for-purpose computer is looking inevitable. I know that hardware fails, but why don’t I keep backups? Sure, all of my code is kept with an SVN repository, datasets are typically mirrored across different computers, but a load of stuff like photos and iTunes lived only on the laptop.

Apple computers are pretty good, pretty smart and make life rather easy. I think that I really should get a TimeCapsule or an external disk so at least I can start routinely copying the valuable parts of my computational existence. We have the information management organisations in industry who make sure that we can’t waste our time or lose our data and establish meaningful processes. Why can’t I learn from their example?

Now to find the time to buy a new disk, a backup disk and start the slow process of recovering what may or may not be recoverable!

off on holiday

Monday, February 16th, 2009

I’m heading out-of-office for a few days. I’m taking a break with the 2 eldest kids and we’re off to the UK for a break, museums, and some culture! I guess that the bioinformaticsblog will be untouched during this time, though I may add a few pictures through ShoZu.

Windows / applications open on your computer

Tuesday, February 10th, 2009

windows-31a-screen-shot.jpg

As an avid reader of Slashdot, I am amused to read this morning their comments on the forthcoming iteration of the horror that is called Windows. Apparently the next round of Windows-lite will allow for 2 open applications, but according to different sources the average windows user has 8-15 windows open. Check out the Slashdot post here!

As a bioinformatician I feel that I am a little more talented than the average windows user; perhaps not a windows power user (since I choose not to use Windows when possible), but a quick look at my Windows desktop shows that I have 9 applications open (Excel, R, Terminal, Outlook, IE (yuk), Tectia SSH, Acrobat, Endnote and Firefox) and a total of 27 open windows. This is all within the limitations of Windows and I hope is acceptable usage.

This begs questions as to how we all work, how we access and collate information. On the Unix box that I can work on (almost fit for purpose corporate approved RH install) I have fewer running applications (Eclipse, Firefox, Netbeans, shell and R), but many more windows with open R, bash, python and remote screen connections. Some of these connections are to remote MySQL databases, internal PostgreSQL databases and distributed resources such as Biomart.

The thought of having to work with just 2 windows or applications is a little scary – do you close applications when you are not actively using them? I guess that there is some advantage of the corporate view – at least we’ll eventually get something shiny, polished and largely stable – butr certainly well tested. I still like my MacBook Pro though. With Spaces I have 4 desktops (this is sufficient) and within each we have different themes (java / R / mail and web / documents) – plenty of running applications and a flurry of windows.

How do you all work? What are your practices? Perhaps we should even question as to how your desk appears, and how your desktop looks?