Martini: using literature keywords to compare gene sets

January 4th, 2010

martini I have been having a big push over the last couple of months to consider the needs of biocontextualisation of gene list data within a commercial setting. A bioinformatician is not always capable of making a rational decision based on gene names, and even dedicated biologists are swamped by genes that fall outside of their specialist domains – does a collection of differentially expressed genes relate to something good, to something bad, or to something unexpected?

Martini is a pretty good looking solution that was published in Nucleic Acids Research (NAR 2010, Vol 38, p26-38) by researchers from EMBL in Heidelberg. The solution is based on a somewhat clumsy web interface, but the data behind is based on Medline abstract data and more importantly Medline keywords. Some bioinformatics magic, a little creativity and clearly a lot of hard work are then used to condense a set of differentially expressed genes into something that appears (at least superficially thus far) pretty useful and clearly demonstrates superiority over other solutions that are pretty tightly bound to the Gene Ontology.

What about commercial usage? The authors state pretty clearly that the solution is free-for-all! This comes of course with a caveat; is it really wise to upload proprietary gene list data to an external server? While we wouldn’t lose knowledge on molecular structures, there is certainly a risk element here and I imagine that most commercial bioinformaticians will avoid this tool. I feel that the tool also suffers from the need to work with the somewhat loveless web interface. A cleaner R based API would be lovely – perhaps a task for the weekend?

In conclusion Martini looks very appealing, have a clear reason for existence and is certainly something that I will evaluate. I am already tempted to see how I might use the system within an R integration project – straight to paper reporting is the way to go! Have a look at http://martini.embl.de to see what can be done.

Bioinformatics-for-dummies

December 30th, 2009

What is the point of this blog? In the past it was intended as a forum to transfer knowledge across the academic-industrial divide, but I now suspect that I am a firm industrial-bioinformatician and a couple of failed attempts to move back (to academia) in the last 6 months seem to have concreted my intentions to stay on this side of the divide.

Now is I feel, the right time to reestablish the goals of this blog, and the aim should be literature review (for industrial bioinformaticians) and to set out flows, thoughts and logic that may aid and assist best-of-breed bioinformatics workflows in industry, but using open- and academic driven solutions.

Back in business …

December 30th, 2009

I have neglected the blog for quite a long time, a few engines read the pages daily, but the whole business was looking a little tired and I lacked the enthusiasm to continue without some form of reward. Finally a new year, and my chance to establish the BioinformaticsBlog as somewhere relevant? I’m on the case!

google discovers the arse of Finland!

August 22nd, 2009


Turku may not be the centre of the known Universe, but 3 google camera cars in downtown Turku suggests that this place may be more important than usually given credit for!

Posted by ShoZu

Polished boat!

June 12th, 2009


Bioinformaticians need hobbies, along with coding, family, racing bikes and motorcycles I love yachts and sailing. Here the boat is ready for launch and well attached to a Saab. Who said that bioinformatics can’t be the best job in the world?

Posted by ShoZu

ping …

June 12th, 2009

There has been rather high latency associated with the blog! Interest remains, but other duties have been at the fore! I’ll focus again, bioinformatics remains my life!

Posted by ShoZu

Summer is coming …

April 13th, 2009

img_4579.jpg

It has been Easter weekend, and as a surprise in Finland we have even had some sunshine and temperatures in double positive figures! When presented with both sun and a little warmth then the bicycle must come out.

Many of the best things in life are associated with Italy; good food, good wine, great motorcycles and fantastic racing bicycles. (It’s just a shame that the Italians can’t make cars – just look at Fiat …) My own Italian bike is now out and we went for the first dash of the season – there is a great round tour of the island where I live and I had fun, some pain, aching lungs, tortured legs, and a rather pathetic time, but we are now into cycle season.

I sometimes wonder what other bioinformaticians do for escape? Bioinformaticians I very much hope are not the archetypal geeks – pale skin unexposed to the sun – flabby waists and oily skin … Yikes … I am a fan of the warm outdoors and like sailing, biking and hiking. I am now in summer mood and things feel really positive.

I have been clearly trees and bushes in the garden, have tidied parts of the estate and will soon start preparing the boat for the sailing season. Life feels great at the moment – I guess that chocolate really does help.

Bugblatter – a bug tracking software for bioinformaticians

April 8th, 2009

rbbbot.jpg

I have spent rather too many hours over the last couple of days looking at and reviewing software for tracking ideas, plans, bugs and assigning meaning to what is supposed to fairly straightforward software development. I am looking for a simple piece of software that can run as a single user environment and can provide a list of projects, plans and bugs. Trac and Bugzilla are server side and heavy. Excel is awful and there doesn’t seem to be anything that I can run from a memory stick.

I have placed a software design brief with my contacts at Mnemosyne BioSciences and have asked for the development of a simple, OS agnostic solution that can run either as a single user from local files or can interact with a SVN server (or even as something more embedded). They have approved my design brief and have promised to develop a java tool for Windows and OSX that will provide bug tracking, reporting and management capabilities as a standalone tool. They have charged a pretty reasonable start-up fee for the project, but their understanding of the task is pretty much what I had envisioned from the start.

The name for their planned tool is “Mnemosyne Bugblatter”. Cool name, let’s see how the software looks when delivered? If anyone else could be interested in a simple portable tool for tracking projects, bugs and managing feature creep then please send a mail to bugblatter@mnemosyne.co.uk

fogbugz?

April 8th, 2009

fogbugz.png

FogBugz is a pretty clean and simple bug tracking software that is pretty well integrated in Eclipse. Not perfect since it is reliant upon some form of live connection, but this is a pretty good solution for something.

I have created an evaluation account with FogBugz and can create a collection of incidents, WIKIs and documentation. The code is hosted on their servers, and everything is presented through a rich web page GUI. There is more potential through the availability of a FogBugz GUI, and this could be used to solve many of my requirements of a platform independent off-line bug reporting and analysis environment.

One issue that is worth considering is that FogBugz is not free; it is a commercial solution costing $25 per user per month. For a small company such as Mnemosyne BioSciences this is acceptable, but for academic environments this becomes complicated…

This is certainly something that I intend to evaluate more fully, and I will report back on my experiences as I reach the end of the evaluation period and make the decision as to whether the FogBugz solution could provide an answer for the problem of tracking software and development and bugs within a bioinformatics data analysis environment project.

tracking bugs, managing plans and coping with feature creep

April 8th, 2009

nbss.png

I am a bioinformatician. My background is as a traditional geneticist, my PhD was in the fields of molecular biology (and a little phylogenetics and domain analysis). I only entered the domain of bioinformatics during my Post doctoral years when I worked as a genome annotator for the first green eukaryotic genome project. During this time I learned a lot of PERL, moved into Python and integrated a load of stuff to link my needs with a relational database and distributed jobs across a large cluster to run the typical InterPro / UniProt / nonred type tasks. GUI was never considered (or attempted) and code remained ad hoc until broken.

Now 10 years on from these heady days, I am now writing code in C, Java, R and a little Python. During my few years as an adjunct professor at a Finnish research centre I rewrote the whole software pipeline that I imagined in Java as a rather monolithic beast and have reimplemented the whole stack as something a little more abstracted and perhaps useful over the last few years as a distraction on the daily commute to the capital city. The whole software environment is now several hundred thousand lines of code, we have a rich GUI delivered over HTML and through Java WebStart and things are finally beginning to look how they should have when I first starting planning the project back in 2004.

Critical at this point is how is a self-taught informatician supposed to handle this code? I work alone, there is no code audit and no one works with me to validate, correct or comment on my code! I maintain a single code tree and this is at least within a code versioning system (subversion for bioinformatics is great …), so I am hopefully not completely inept at doing my work.

My question is how should I really manage the long list of non-specific issues, bugs and problems that I routinely encounter. A campaign to resolve an issue that has been introduced through feature creep can take hours if not days, and during this time I undoubtedly discover many more bugs and issues…

At the moment bugs are documented within a text file of problems, issues and events. I have a todo list, and this seems rather inefficient and trivial. I know that something like Bugzilla could work, but this seems rather more complex than is absolutely needed. I also work on a train, and therefore don’t have web access for much of the commute – a client side project that can be synced through SVN would be ideal. I also work on Linux, OSX and Windows, so ideally something that is cross-platform would be great…

This seems like a tall order, and something that there is no simple answer for. What bug tracking software do other bioinformaticians use?