Archive for the ‘translational science’ Category

Bugblatter – a bug tracking software for bioinformaticians

Wednesday, April 8th, 2009

rbbbot.jpg

I have spent rather too many hours over the last couple of days looking at and reviewing software for tracking ideas, plans, bugs and assigning meaning to what is supposed to fairly straightforward software development. I am looking for a simple piece of software that can run as a single user environment and can provide a list of projects, plans and bugs. Trac and Bugzilla are server side and heavy. Excel is awful and there doesn’t seem to be anything that I can run from a memory stick.

I have placed a software design brief with my contacts at Mnemosyne BioSciences and have asked for the development of a simple, OS agnostic solution that can run either as a single user from local files or can interact with a SVN server (or even as something more embedded). They have approved my design brief and have promised to develop a java tool for Windows and OSX that will provide bug tracking, reporting and management capabilities as a standalone tool. They have charged a pretty reasonable start-up fee for the project, but their understanding of the task is pretty much what I had envisioned from the start.

The name for their planned tool is “Mnemosyne Bugblatter”. Cool name, let’s see how the software looks when delivered? If anyone else could be interested in a simple portable tool for tracking projects, bugs and managing feature creep then please send a mail to bugblatter@mnemosyne.co.uk

Does the Biomarker Search Paradigm Need Re-Booting?

Wednesday, March 4th, 2009

plantmarkers.png(a nice logo from a deprecated database, and absolutely nothing to do with this review …)

Robest Hurst,

BMC Urology 2009, 9:1

Published in BMC Urology is a wonderful, well written and provoking commentary on the development of biomarkers. The author describes the state-of-the-nation in biomarker development for the characterisation and classification of bladder cancer, and argues that enough-if-enough and now is the time for the biomarker development field to wake up and start developing useful biomarkers. While the article has absolutely nothing to do with bioinformatics (apart from a little reference towards algorithms in the final sentences), I know that many bioinformaticians are working in the biomarker development field.

Bladder cancer is currently monitored most effectively using cystoscopy – an invasive method, but one which is suggested to have a 95% sensitivity. One issue with bladder cancer is that there is an insiduous recurrence; and treated patients of often superficial cancers develop aggressive invasive disease, and this kills 50% of patients… The need for a biomarker is clear, with >95% sensitivity from a non-invasively sampled biosample, and patients would likely be more compliant with post-treatment follow-ups. The issue is reiterated several times that sensitivity of prognostic markers of disease progression is key.

Stick-based protein assays have been developed for analysis of urine samples, but suffer from <70% sensitivity – the author describes “betting lives on a test with worse sensitivity than the gold standard“, and further questions the value of the tests based on the fatal consequence of false-negatives and the cost of follow-up on the false-positives.

I am really happy to read the author’s dissection of microarray and proteomic-based biomarker discovery. The author acknowledges the naive nature of magically robust, sensitive and specific biomarkers from the results, and states the unpredictable nature of the homeostatic ripples that move outwards from a peturbation within interconnectded network of cooperating proteins. The promise of biomarkers is therefore dismissed with the statement that “the probability of finding a single biomarker with the requisite sensitivity and specificity is vanishingly small“. Does this mean that we can pack our bags instead and go home?

Fortunately not! Hurst instead argues that the combination of biomarkers from existing studies into practical panels is the way ahead instead of yet more studies searching for the elusive individual biomarker. With the acknowledgement that all cancers are largely unique, and that thousands of samples would be required to obtain robust samples, the emphasis should be placed on the selection of biomarker panels from small numbers of assays that are largely independent, but which are relective of the overall phenotype, and the historical approach of modelling causality within the system should be abandonned; the leads of the re-boot within the title! Most encouragingly the author also states that “the search for candidate biomarkers needs to be divorced from the validation in clinical populations” and advocates the development of biomarker panels in surrogate model systems with cancer patient specimens as a validative tool rather than a discovery tool.

This stuff is common sense, obvious and clear to bioinformaticians, but not always to the scientists and clinicians closer to the patient. This is a well written article and should be distributed widely; the final sentence really summarises it well “the intelligent development of biomarkers truly is a problem in systems biology.”

ArrayPlex: distributed, interactive and programmatic access to genome sequence, annotation, ontology, and analytical toolsets.

Tuesday, February 10th, 2009

arrayplex.png

Genome Biol. 2008;9(11):R159. Epub 2008 Click here to read 

 

Killion PJ, Iyer VR.

PMID: 19014503

 
Another quick manuscript review for something that I hope that most bioinformaticians (working in or around core facilities have already read). ArrayPlex is an orgy of my favourite bioinformatics themes; distributed data, tomcat, PostgreSQL, expression data, OSX – you name it, its probably already in this paper.
 
This manuscript describes a system that aims to meet an unmet need within the field of applied bioinformatics, an integrated and centralised system for the storage and maintenance of microarray data. The resource is aimed at balancing the primitive raw data (gene expression content) with the associated annotative context (relating to gene names, gene identifiers and functional annotations). The system is designed for sensible operating systems (will not run on Windows ;-) ) and is deployed as a Tomcat service. ArrayPlex looks after itself (or so the authors suggest) and builds an operating environment using data trawled from the public domain, and appears extensible through the provision of API. 
 
While I haven’t yet installed or deployed ArrayPlex for a formal evaluation of functionality, much of the functionality it provides is already available from other resources. The authors stress that it isn’t intended as a substitute for e.g. the BASE database, but rather suggest that it may be an alternative for some would-be Bioconductor users… I am not sure what comment to make here, but I really don’t see too much competition in ArrayPlex! The screenshots provided within the manuscript are beautiful and make the system look like an extremely attractive tool – if it has data aggregation or integration capabilities as promised then this will be a must-have tool in the future; especially if there is any scope for R/bioconductor integration.
 
My feeling – this paper is a  Smörgåsbord of great bioinformatics themes, and is something that really should be investigated further! It leaves me a little concerned however; the authors suggest that Bioconductor is difficult to use because of it’s lack of GUI and need for shell. Quite how an inexperienced user will cope with the dependencies of installing Tomcat, postgresql and other applications on a UNIX or OSX box (without shell) is quite beyond me. The descriptions of the pipelines are attractive, and I am at least convinced that the system is worth a look, and should perhaps be earmarked for inclusion within the BioRAM linux distribution.
 
I should also note that several paradigms and intents are shared between ArrayPlex and my very own Mnemosyne LabManager application. The LabManager though, aims to provide an abstraction layer to the underlying R/Bioconductor, and provides mechanisms for an R proficient user to benefit from the server side and encompassing APIs at the same time… Time will tell?

Bladder Cancer-Associated Gene Expression Signatures Identified by Profiling of Exfoliated Urothelia

Tuesday, February 10th, 2009

heatmap1.png

I am having a push do understand how we could be more effectively selecting candidate biomarkers from both our own proprietary datasets and from the wealth of public data that has been deposited within e.g. ArrayExpress or GEO. I have identified the following manuscript as being worthy of review and hope that you might agree with some of my feelings.

Cancer Epidemiol Biomarkers Prev. 2009 Feb 3. [Epub ahead of print]Click here to read

Rosser CJ, Liu L, Sun Y, Villicana P, McCullers M, Porvasnik S, Young PR, Parker AS, Goodison S.

PMID: 19190164 [PubMed - as supplied by publisher]

 The authors argue that bladder cancer, when detected early is largely treatable with a 5-year survival rate of ~94%; this is of course greatly hindered by the papillary tumours that invade the surrounding muscle tissues. Following surgery to remove the tumour, regular checkups are required to ensure that there is no tumour recurrence. These checkups are performed using cystoscopy – an invasive and rather unpleasant sounding procedure. There is thus substantial opportunity for the improvement in quality of life through the early detection of tumour recurrence using less invasive methods, with urine representing an ideal biosource type.

Urine cytology can be used for the diagnosis of new malignancy, albeit with rather low sensitivity and imperfect specificity. Existing protein based biomarkers have high false-positive rates and other development methods suffer from insufficient predictive power. In this manuscript, the authors rise to the challenge of developing a gene-expression based biomarker panel for the diagnosis of bladder cancer through the profiling of urothelial cells from bladder washes. From a panel of confirmed bladder cancer patients and apparently healthy (with respect to bladder cancer) controls profiling of amplified RNA was performed using the U133 plus 2 platform.

The resulting expression data was analysed within a framework of feature selection algorithms; the authors have previously described a machine learning approach that has been successfully applied within both breast and prostate cancers. The structure of the experimental data looks good, only one cancer patient profile clustered with the control patients within hierarchical cluster analysis. Pathway analysis software was used to map observations onto relevant biological context, and a 14-gene model was refined to build a classifier. The resulting classifier yielded 76% accuracy in cancer class prediction; apparently a reasonable feat considering the cytologic classification was only 35%.

The manuscript is certainly not presenting anything amazingly new, but is showing the application of existing technologies to demonstrate a proof-of-principal in meeting an as yet unmet medicinal need. The logical workflow within the manuscript is clear, the arguments are well presented and bioinformatics methodology is certainly acceptable. The paper, in my opinion, is noteworthy because it is not an open-and-shut case. There is plenty of room for improvement (18/20 cancer patients identified), so the derived panel suffers from both false negatives (a low rate) and false positives. The starting panel is heterogeneous with patients of mixed ages, sexes, ethnic backgroinds and  clinical characteristics. I see much more to be done here, and this has been real food-for-thought!

Target discovery from data mining approaches

Monday, February 2nd, 2009

90-1.jpg

Yongliang Yang et al., Drug Discovery Today 2009, Vol 14, p147-154.

Target discovery is a key area within drug development: you can’t develop a drug without a target (anymore), and I am sure that many bioinformaticians working within pharma and biotech spend a not inconsiderable amount of their time compiling portfolios of information relating to a development drug’s target protein.

It is therefore great to see a review article in Drug Discovery Today that highlights, describes and outlines the informatics workflows that may be used to discover a meaningful target. This is not a review article for a seasoned corporate bioinformatician, but is rather a good illustration of much of bioinformatics on the commercial side of the academic/commercial divide. It is rather obvious however that the authors are academic, and that the focus of the article is more towards the academic characterisation of a target rather than the approach that the more enterprise oriented bioinformatician would take! The article is not weakened from this since there the authors place considerable stress on the fact that there are huge volumes of data out there, and that there is considerable benefit to be reaped by making sense of, and integrating, these data.

The article is well written and provides a meaningful review of integrative data mining. For bioinformaticians considering a career in corporate bioinformatics, this provides a robust view of what we spend much of our time doing, although the tools summarised may not be the optimal tools within a corporate setting.

This is a good manuscript, not a great one. It highlights the issues within target characterisation and understanding and is a worthy read for all.

Next generation tools for the annotation of human SNPs

Monday, February 2nd, 2009

90-1.jpg

Rachel Karchin, Briefings in Bioinformatics, 2009, Vol 10, 35-52

This is another timely review article published in Briefings in Bioinformatics, and as my own job duties become a little more translational, something that is of immediate relevance and interest.

This is a typical review article and I feel is noteworthy for the depth and exploration of human SNP based data-resources. The author collates a very comprehensive resource of web-based databases (21 in total) and evaluated their potential for both usability (from a bioinformatician’s perspective) and for utility (as in the resource might be of real use). The results from this near exhaustive analysis is provided as a meaningful set of supplementary data. This evaluation is further supplemented by meaningful and largely typical case studies that might be encountered within a typical drug-discovery or translational-medicine campaign.

Characterisation of intronic SNPs characterised within Schizophrenia, novel amyotropic lateral sclerosis SNPs and mixed esophageal cancer SNPs truly highlight the potential roles of the different resources in SNP characterisation.

The manuscript is certainly of benefit for researchers and bioinformaticians aiming to fast-forward their knowledge of web-based polymorphism databases and resources. There are also very welcome and relevant statements as to non-synchronous nature of data between reference (and de facto up-to-date) resources such as dbSNP and the plethora of derived databases whose data is largely anchored to older (and in some cases ancient) releases of dbSNP.

The outlook and summary statements are of great relevance to the authors of the next generation of web-enabled derived data containing bioinformatics resources. The author is not naive and does comment on the limitations of the development, support and maintenance needs of academic SNP webservers. Bioinformatics in academia is driven by the need for publications and maintenance of existing resources is largely not very publishable. It should be noted here (in bold even) that Nucleic Acids Research does publish the annual Database and Webserver issue, perhaps the only journal of impact that supports the systematic republication of, in many cases, rather old resources.

Overall, much of the content within this review is old-hat, the evaluation of the resources and the comparison of added-value within typical SNP characterisation workflows is however elegant and of real value to many bioinformatics, pharmacogenetics and translational systems biology researchers. I am not sure that I flag this manuscript as a must read, but it is certainly worth packing into your bag if you’re travelling on a train, plane or have a few minutes to spare at home tonight.