ping …
June 12th, 2009Summer is coming …
April 13th, 2009It has been Easter weekend, and as a surprise in Finland we have even had some sunshine and temperatures in double positive figures! When presented with both sun and a little warmth then the bicycle must come out.
Many of the best things in life are associated with Italy; good food, good wine, great motorcycles and fantastic racing bicycles. (It’s just a shame that the Italians can’t make cars - just look at Fiat …) My own Italian bike is now out and we went for the first dash of the season - there is a great round tour of the island where I live and I had fun, some pain, aching lungs, tortured legs, and a rather pathetic time, but we are now into cycle season.
I sometimes wonder what other bioinformaticians do for escape? Bioinformaticians I very much hope are not the archetypal geeks - pale skin unexposed to the sun - flabby waists and oily skin … Yikes … I am a fan of the warm outdoors and like sailing, biking and hiking. I am now in summer mood and things feel really positive.
I have been clearly trees and bushes in the garden, have tidied parts of the estate and will soon start preparing the boat for the sailing season. Life feels great at the moment - I guess that chocolate really does help.
Bugblatter - a bug tracking software for bioinformaticians
April 8th, 2009![]()
I have spent rather too many hours over the last couple of days looking at and reviewing software for tracking ideas, plans, bugs and assigning meaning to what is supposed to fairly straightforward software development. I am looking for a simple piece of software that can run as a single user environment and can provide a list of projects, plans and bugs. Trac and Bugzilla are server side and heavy. Excel is awful and there doesn’t seem to be anything that I can run from a memory stick.
I have placed a software design brief with my contacts at Mnemosyne BioSciences and have asked for the development of a simple, OS agnostic solution that can run either as a single user from local files or can interact with a SVN server (or even as something more embedded). They have approved my design brief and have promised to develop a java tool for Windows and OSX that will provide bug tracking, reporting and management capabilities as a standalone tool. They have charged a pretty reasonable start-up fee for the project, but their understanding of the task is pretty much what I had envisioned from the start.
The name for their planned tool is “Mnemosyne Bugblatter”. Cool name, let’s see how the software looks when delivered? If anyone else could be interested in a simple portable tool for tracking projects, bugs and managing feature creep then please send a mail to bugblatter@mnemosyne.co.uk
fogbugz?
April 8th, 2009
FogBugz is a pretty clean and simple bug tracking software that is pretty well integrated in Eclipse. Not perfect since it is reliant upon some form of live connection, but this is a pretty good solution for something.
I have created an evaluation account with FogBugz and can create a collection of incidents, WIKIs and documentation. The code is hosted on their servers, and everything is presented through a rich web page GUI. There is more potential through the availability of a FogBugz GUI, and this could be used to solve many of my requirements of a platform independent off-line bug reporting and analysis environment.
One issue that is worth considering is that FogBugz is not free; it is a commercial solution costing $25 per user per month. For a small company such as Mnemosyne BioSciences this is acceptable, but for academic environments this becomes complicated…
This is certainly something that I intend to evaluate more fully, and I will report back on my experiences as I reach the end of the evaluation period and make the decision as to whether the FogBugz solution could provide an answer for the problem of tracking software and development and bugs within a bioinformatics data analysis environment project.
tracking bugs, managing plans and coping with feature creep
April 8th, 2009
I am a bioinformatician. My background is as a traditional geneticist, my PhD was in the fields of molecular biology (and a little phylogenetics and domain analysis). I only entered the domain of bioinformatics during my Post doctoral years when I worked as a genome annotator for the first green eukaryotic genome project. During this time I learned a lot of PERL, moved into Python and integrated a load of stuff to link my needs with a relational database and distributed jobs across a large cluster to run the typical InterPro / UniProt / nonred type tasks. GUI was never considered (or attempted) and code remained ad hoc until broken.
Now 10 years on from these heady days, I am now writing code in C, Java, R and a little Python. During my few years as an adjunct professor at a Finnish research centre I rewrote the whole software pipeline that I imagined in Java as a rather monolithic beast and have reimplemented the whole stack as something a little more abstracted and perhaps useful over the last few years as a distraction on the daily commute to the capital city. The whole software environment is now several hundred thousand lines of code, we have a rich GUI delivered over HTML and through Java WebStart and things are finally beginning to look how they should have when I first starting planning the project back in 2004.
Critical at this point is how is a self-taught informatician supposed to handle this code? I work alone, there is no code audit and no one works with me to validate, correct or comment on my code! I maintain a single code tree and this is at least within a code versioning system (subversion for bioinformatics is great …), so I am hopefully not completely inept at doing my work.
My question is how should I really manage the long list of non-specific issues, bugs and problems that I routinely encounter. A campaign to resolve an issue that has been introduced through feature creep can take hours if not days, and during this time I undoubtedly discover many more bugs and issues…
At the moment bugs are documented within a text file of problems, issues and events. I have a todo list, and this seems rather inefficient and trivial. I know that something like Bugzilla could work, but this seems rather more complex than is absolutely needed. I also work on a train, and therefore don’t have web access for much of the commute - a client side project that can be synced through SVN would be ideal. I also work on Linux, OSX and Windows, so ideally something that is cross-platform would be great…
This seems like a tall order, and something that there is no simple answer for. What bug tracking software do other bioinformaticians use?
The silence of the bioinformaticians
April 1st, 2009I have been silent and rather occupied with a small project that is swallowing most of my free time; the development of a server::client interface for the analysis of Affymetrix GeneChips utilising the aroma.affymetrix package. Things are going pretty well; but I have introduced rather more complexity into the system by moving my logging routines from something home made into a more formal log4j schema.
At the moment I have a lovely struggle with the system - something in my code base appears to have some “deprecated” log4j code - and running the methods through Tomcat yields some rather vague and uninterpretable error messages.
java.lang.NoSuchFieldError: level
WTF? This problem appears to be pretty well documented across the WWW - but still resolving the problem down into a .jar file that should be upgraded, deleted or something is a little trying!
Alas - this is bioinformatics!
The problem has been solved, and bioinformatics was again the issue. Within my code I have the martj .jar file that provides some connectivity with the BioMart infrastructure. Within this martj.jar is a copy of log4j; and this version collides with the version that I was trying to use! What a great use of rather too many hours - but at least a load of code has now been refactored and the .jar dependencies are now a little cleaner!
BioconductorBuntu - A Linux Distribution that Implements a Web-Based DNA Microarray Analysis Server
April 1st, 2009Paul Geeleher et al., Bioinformatics Advance Access Publication March 23rd, 2009.
Fresh in the latest version of Bioinformatics Advance Access is a rather wonder short correspondence on BioconductorBuntu. The authors of this brief article have highlighted a rather important divide within the bioinformatics community; those who can use R and those who can’t.
To solve the issue of “hot” microarray data analysis for those fearful of scripting, the authors have implemented a whole Ubuntu distribution containing the requisite packages, software and servers for rapid deployment of a data analysis server. In addition to just providing R and some bioconductor packages the authors have also implemented a basic framework of authentication and ownership, and some core GUIs to streamline the process of uploading, analysis and reporting the content of DNA microarray studies. In contrast to earlier efforts such as AMDA (Genopolis, Italy) the authors have provided mechanisms for the handling of Affymetrix data, single and dual colour arrays.
The workflow appears to contain all core elements of data validation, QC and differential expression analysis and also provides a little content for both GSEA and KEGG type analyses.
This is in my humble opinion a wonderful piece of work. Certainly this is not a complete solution (what about Illumina or Agilent data in their more native file structures?) and the reporting is lacking outside of the most basic content - but it does deliver an elegant and functional system for the dirty and unwashed masses. The wrapping of the stack onto an Ubuntu “spin” is great - if as promised I can download an iso, burn a disk, boot, install and rock-and-roll then this really could stand to be a really useful tool sitting in the corner of many small labs.
I have some vague suspicions though that this approach is doomed to failure. The biologists who cannot use R and Bioconductor are the same people who will be terribly afraid of booting a linux workstation and installing something by themselves. These are the same people who will be least well prepared to diagnose the problems on the server, and who will need the most training and babysitting to get them to the stage where the software can be applied in a meaningful way! Not a detraction from the paper, while BioconductorBuntu is a very elegant solution, and promises to solve some of the problems, a bioinformatician, IT guy or statistician is really needed to get the biologist up-and-running. Thank goodness - our jobs are still safe for the time being
This is certainly a well-earned-paper-of-the-week. Congratulations Paul et al.,
EMAAS: An extensible grid-based Rich Internet Application for microarray data analysis and management
March 23rd, 2009G Barton et al., BMC Bioinformatics 2008, 9:493doi:10.1186/1471-2105-9-493
EMAAS is another environment for handling and analysis of gene expression data. The authors have set about the development of a distributed e-support system for the management and analysis of microarray data; to provide access to complex methods and to apply (from a biologist’s POV) non-trivial technologies to handle large multi-variate datasets.
Whilst other solutions have missed the point and taken an easy approach to solving the problem, the EMAAS approach is rather more complicated and relies instead on integration of internet accessible tools, standard statistical packages (R/Bioconductor) and web-resources (CELSIUS, GEO). The decision to aim for a modular and flexible framework is excellent and makes this in my opinion a very much more interesting project. The completeness with which tools and environments has been included is breathtaking; the depth of IT and analytical platforms required is rather daunting.
In contrast to the manuscript reviewed in the last post, this resource’s source is available under a suitable GPL license, and some of the demo server also works. I have some problems with the resource (Flash for a start), but this is one smooth implementation and is packaged in such a way that I could take it for a spin if I so wished!
This manuscript is heavy to read, but a damned fine resource is described underneath the technical fluff. This is a great resource and this earns a great recommendation from the bioinformaticsblog.
SiPaGene: A new repository for instant online retrieval, sharing and meta-analyses of GeneChip® expression data
March 23rd, 2009Adriane Menßen et al., BMC Genomics 2009, 10:98 doi:10.1186/1471-2164-10-98
This manuscript describes a new database, data warehouse and analytical platform for the handling of Affymetrix based gene expression data. The authors identify the need for a database that is convenient, facilitates online analysis and provides user-specific sharing options, and further qualifies their understanding of an unmet database need with the statement that “… existing tools do not use the whole range of statistical power provided by the MAS5.0/GCOS algorithms”.
I agree with the authors that there is such a gap within the database arena for a MIAME compliant database that provides both data warehousing and data analytical capabilities; the addition of user-specific access rights is great, but the MAS5 and GCOS methods undoubtedly have their place, but their usage alone is perhaps naive?
The authors fill a number of quite heavy pages with their description of a refreshingly heavyweight database infrastructure (Java, ancient Oracle) that is currently biased towards their local research environments interest in immunology, inflammation, regeneration and cancer. Such alengthily described database is then populated with only 1000 arrays.
This manuscript is of interest, the approach is nice; a combined warehouse and analysis environment. I have some problems with the database though. “Non-academic commercial use is restricted” is a waste; I would never consider paying for this resource when fantastic solutions from SAS JMP Genomics / GeneData / … with full support, testing and scalability are available with a lower TCO. To see what has been done, how well it performs and to play with a resource is nice.
I suspect that this is another fail - the online demo will not even work
So, nice try, but no cigar. The manuscript is nice, convincingly written and more professional than some solutions out there. The web presentation looks fugly, and is also broken. The politics of code availability is plainly stupid - those who can pay will not because the implementation is not sufficiently good - Charite, please make the code a little more available!



