April 1st, 2009
I have been silent and rather occupied with a small project that is swallowing most of my free time; the development of a server::client interface for the analysis of Affymetrix GeneChips utilising the aroma.affymetrix package. Things are going pretty well; but I have introduced rather more complexity into the system by moving my logging routines from something home made into a more formal log4j schema.
At the moment I have a lovely struggle with the system – something in my code base appears to have some “deprecated” log4j code – and running the methods through Tomcat yields some rather vague and uninterpretable error messages.
java.lang.NoSuchFieldError: level
WTF? This problem appears to be pretty well documented across the WWW – but still resolving the problem down into a .jar file that should be upgraded, deleted or something is a little trying!
Alas – this is bioinformatics!
The problem has been solved, and bioinformatics was again the issue. Within my code I have the martj .jar file that provides some connectivity with the BioMart infrastructure. Within this martj.jar is a copy of log4j; and this version collides with the version that I was trying to use! What a great use of rather too many hours – but at least a load of code has now been refactored and the .jar dependencies are now a little cleaner!
Posted in Uncategorized | No Comments »
April 1st, 2009

Paul Geeleher et al., Bioinformatics Advance Access Publication March 23rd, 2009.
Fresh in the latest version of Bioinformatics Advance Access is a rather wonder short correspondence on BioconductorBuntu. The authors of this brief article have highlighted a rather important divide within the bioinformatics community; those who can use R and those who can’t.
To solve the issue of “hot” microarray data analysis for those fearful of scripting, the authors have implemented a whole Ubuntu distribution containing the requisite packages, software and servers for rapid deployment of a data analysis server. In addition to just providing R and some bioconductor packages the authors have also implemented a basic framework of authentication and ownership, and some core GUIs to streamline the process of uploading, analysis and reporting the content of DNA microarray studies. In contrast to earlier efforts such as AMDA (Genopolis, Italy) the authors have provided mechanisms for the handling of Affymetrix data, single and dual colour arrays.
The workflow appears to contain all core elements of data validation, QC and differential expression analysis and also provides a little content for both GSEA and KEGG type analyses.
This is in my humble opinion a wonderful piece of work. Certainly this is not a complete solution (what about Illumina or Agilent data in their more native file structures?) and the reporting is lacking outside of the most basic content – but it does deliver an elegant and functional system for the dirty and unwashed masses. The wrapping of the stack onto an Ubuntu “spin” is great – if as promised I can download an iso, burn a disk, boot, install and rock-and-roll then this really could stand to be a really useful tool sitting in the corner of many small labs.
I have some vague suspicions though that this approach is doomed to failure. The biologists who cannot use R and Bioconductor are the same people who will be terribly afraid of booting a linux workstation and installing something by themselves. These are the same people who will be least well prepared to diagnose the problems on the server, and who will need the most training and babysitting to get them to the stage where the software can be applied in a meaningful way! Not a detraction from the paper, while BioconductorBuntu is a very elegant solution, and promises to solve some of the problems, a bioinformatician, IT guy or statistician is really needed to get the biologist up-and-running. Thank goodness – our jobs are still safe for the time being
This is certainly a well-earned-paper-of-the-week. Congratulations Paul et al.,
Posted in R/bioconductor, best working practices, paper-of-the-week, public data | 2 Comments »
March 23rd, 2009
G Barton et al., BMC Bioinformatics 2008, 9:493doi:10.1186/1471-2105-9-493
EMAAS is another environment for handling and analysis of gene expression data. The authors have set about the development of a distributed e-support system for the management and analysis of microarray data; to provide access to complex methods and to apply (from a biologist’s POV) non-trivial technologies to handle large multi-variate datasets.
Whilst other solutions have missed the point and taken an easy approach to solving the problem, the EMAAS approach is rather more complicated and relies instead on integration of internet accessible tools, standard statistical packages (R/Bioconductor) and web-resources (CELSIUS, GEO). The decision to aim for a modular and flexible framework is excellent and makes this in my opinion a very much more interesting project. The completeness with which tools and environments has been included is breathtaking; the depth of IT and analytical platforms required is rather daunting.
In contrast to the manuscript reviewed in the last post, this resource’s source is available under a suitable GPL license, and some of the demo server also works. I have some problems with the resource (Flash for a start), but this is one smooth implementation and is packaged in such a way that I could take it for a spin if I so wished!
This manuscript is heavy to read, but a damned fine resource is described underneath the technical fluff. This is a great resource and this earns a great recommendation from the bioinformaticsblog.
Posted in Implementing C methods in R, R/bioconductor, best working practices, data aggregation, paper-of-the-week, public data, web applications | No Comments »
March 23rd, 2009
Adriane Menßen et al., BMC Genomics 2009, 10:98 doi:10.1186/1471-2164-10-98

This manuscript describes a new database, data warehouse and analytical platform for the handling of Affymetrix based gene expression data. The authors identify the need for a database that is convenient, facilitates online analysis and provides user-specific sharing options, and further qualifies their understanding of an unmet database need with the statement that “… existing tools do not use the whole range of statistical power provided by the MAS5.0/GCOS algorithms”.
I agree with the authors that there is such a gap within the database arena for a MIAME compliant database that provides both data warehousing and data analytical capabilities; the addition of user-specific access rights is great, but the MAS5 and GCOS methods undoubtedly have their place, but their usage alone is perhaps naive?
The authors fill a number of quite heavy pages with their description of a refreshingly heavyweight database infrastructure (Java, ancient Oracle) that is currently biased towards their local research environments interest in immunology, inflammation, regeneration and cancer. Such alengthily described database is then populated with only 1000 arrays.
This manuscript is of interest, the approach is nice; a combined warehouse and analysis environment. I have some problems with the database though. “Non-academic commercial use is restricted” is a waste; I would never consider paying for this resource when fantastic solutions from SAS JMP Genomics / GeneData / … with full support, testing and scalability are available with a lower TCO. To see what has been done, how well it performs and to play with a resource is nice.
I suspect that this is another fail – the online demo will not even work

So, nice try, but no cigar. The manuscript is nice, convincingly written and more professional than some solutions out there. The web presentation looks fugly, and is also broken. The politics of code availability is plainly stupid – those who can pay will not because the implementation is not sufficiently good – Charite, please make the code a little more available!
Posted in data aggregation, paper-of-the-week, public data | No Comments »
March 23rd, 2009

I have been distracted – after a rather busy week on the road, and a week of catching up with paperwork and a few rather critical tasks within the corporate bioinformatics environment and the bioinformaticsblog is left feeling a little blue and rather unloved. There appear to be pretty good numbers of visitors; but the numbers have crashed over the last couple of weeks – there is a dearth of new content.
A quick check of the server logs show that “aroma.affymetrix” is again one of the top search strings that bring people to this place – my planned “aroma.affymetrix tutorials for bioinformatics dummies” is still in progress, and awaiting release. It is pretty crazy, but the next most popular theme for the bioinformaticsblog search is “bioinformatics iPhone”. I am not sure what I should be reading into this; but it seems that this is a hot topic for those of you out there writing BSc and MSc projects in bioinformatics at the moment.
I am really worried by some of the terms that bring people here – who got here through a search for “I love bioinformatics”? A good sentiment, but one that you should be careful in announcing – I will not name and shame (yet)! An equally good query is the “what to do with my life bioinformatics”. I guess that many of us have thought about this, but someone has actually used Google to solve their problem! Related queries include “what to do bioinformatics in industry” and “best working practice bioinformatics”.
Yikes – is this a good thing or something I should be very afraid of?
Bioinformatics and best working practices are somehow interlinked within industrial bioinformatics; and this is likely to include enterprise biocomputing solutions, but will probably not involve any aroma.affymetrix or iPhone. There does appear to be a collective angst in bioinformatics; but it really is a great place to work in-between the corporate reorganisations
Posted in Uncategorized | No Comments »
March 16th, 2009

Blogs, webpages and rants are out there to be read, to inspire and to establish dialogue. This blog page at the rather bluer than necessary Torygraph has an unnecessarily harsh dig at the obese poor. As someone who has lived with the indignity of X(n) sized trousers I can read this article with a mix of mirth and anger.
While the proletariat with TV dinners may show susceptibility to obesity, is there not a correlation with BMI and career. At bioinformatics meetings there is typically a Gaussian distribution of phenoforms and I would argue that whilst sitting at a computer as a productive “middle-class” bio-IT professional that background consumption of coffee (with full-fat milk), donuts and other fat and carbohydrate enriched snacks and a slightly more sedentary than absolutely necessary work style can lead to more issues.
The article in the telegraph has a cheap dig at a consequence, not a cause? Why is obesity such an issue – it seems to be the easy availability of pre-processed foods; easily digested and stored by the body. The lazy are more susceptible to the easy gratification from these well (synthetically) flavoured foods, and a viscious cycle is born. I am uncomfortable with the politicisation of this problem – let us consider the stereotypical gentleman of 150 years ago; a comfortable diet of meats and fortified wines and the corresponding problems with gout, diabetes and girth …
Posted in absolutely nothing at all to do with bioinformatics, biomarkers, life experience | No Comments »
March 13th, 2009

see more pwn and owned pictures
I have had a pretty amazing week in Munich and Freising and have learned things that I needed to know (and unfortunately a few things I wish I didn’t need to know – qPCR really is a bizarre technology). Some former colleagues working with a pharmacogenomics CRO on the edge of Starnberger see recommended the failblog as a good place to waste some time! So winding knowledge acquisition down and exploring their suggestion has yielded a little understanding as to the site- and some pretty hearty laughs! Excellent!
I’ll be back in Finland tomorrow – I can start blogging again then and we should start having a look at the bioinformatics of quantitative PCR – a horrible subject! This has at least given me some enthusiasm to implement an R-based RDML parser – another contribution to head to Bioconductor!
Posted in Uncategorized | No Comments »
March 10th, 2009

A crash course for bioinformaticians presented by the Deutsches Museum in Munich.

Posted in Uncategorized | No Comments »
March 10th, 2009

I have been out of Germany for too long! As a postdoc this meal was a favourite, but after 5 years in Finland it now seems excesive! I am on the road in Munich for the next few days at a bioinformatics solution provider receivinh training for an enterprise system the corporation licensed and will attend a couple of sessions from qPCR 2009 whilst here. How do we find those biomarkers eh?

Posted in Uncategorized | No Comments »