Archive for the ‘biomarkers’ Category

Martini: using literature keywords to compare gene sets

Monday, January 4th, 2010

martini I have been having a big push over the last couple of months to consider the needs of biocontextualisation of gene list data within a commercial setting. A bioinformatician is not always capable of making a rational decision based on gene names, and even dedicated biologists are swamped by genes that fall outside of their specialist domains – does a collection of differentially expressed genes relate to something good, to something bad, or to something unexpected?

Martini is a pretty good looking solution that was published in Nucleic Acids Research (NAR 2010, Vol 38, p26-38) by researchers from EMBL in Heidelberg. The solution is based on a somewhat clumsy web interface, but the data behind is based on Medline abstract data and more importantly Medline keywords. Some bioinformatics magic, a little creativity and clearly a lot of hard work are then used to condense a set of differentially expressed genes into something that appears (at least superficially thus far) pretty useful and clearly demonstrates superiority over other solutions that are pretty tightly bound to the Gene Ontology.

What about commercial usage? The authors state pretty clearly that the solution is free-for-all! This comes of course with a caveat; is it really wise to upload proprietary gene list data to an external server? While we wouldn’t lose knowledge on molecular structures, there is certainly a risk element here and I imagine that most commercial bioinformaticians will avoid this tool. I feel that the tool also suffers from the need to work with the somewhat loveless web interface. A cleaner R based API would be lovely – perhaps a task for the weekend?

In conclusion Martini looks very appealing, have a clear reason for existence and is certainly something that I will evaluate. I am already tempted to see how I might use the system within an R integration project – straight to paper reporting is the way to go! Have a look at http://martini.embl.de to see what can be done.

Phenoforms, social classes and sitting in front of a computer with cookies?

Monday, March 16th, 2009

fat_people.jpg

Blogs, webpages and rants are out there to be read, to inspire and to establish dialogue. This blog page at the rather bluer than necessary Torygraph has an unnecessarily harsh dig at the obese poor. As someone who has lived with the indignity of X(n) sized trousers I can read this article with a mix of mirth and anger.

While the proletariat with TV dinners may show susceptibility to obesity, is there not a correlation with BMI and career. At bioinformatics meetings there is typically a Gaussian distribution of phenoforms and I would argue that whilst sitting at a computer as a productive “middle-class” bio-IT professional that background consumption of coffee (with full-fat milk), donuts and other fat and carbohydrate enriched snacks and a slightly more sedentary than absolutely necessary work style can lead to more issues.

The article in the telegraph has a cheap dig at a consequence, not a cause? Why is obesity such an issue – it seems to be the easy availability of pre-processed foods; easily digested and stored by the body. The lazy are more susceptible to the easy gratification from these well (synthetically) flavoured foods, and a viscious cycle is born. I am uncomfortable with the politicisation of this problem – let us consider the stereotypical gentleman of 150 years ago; a comfortable diet of meats and fortified wines and the corresponding problems with gout, diabetes and girth …

Does the Biomarker Search Paradigm Need Re-Booting?

Wednesday, March 4th, 2009

plantmarkers.png(a nice logo from a deprecated database, and absolutely nothing to do with this review …)

Robest Hurst,

BMC Urology 2009, 9:1

Published in BMC Urology is a wonderful, well written and provoking commentary on the development of biomarkers. The author describes the state-of-the-nation in biomarker development for the characterisation and classification of bladder cancer, and argues that enough-if-enough and now is the time for the biomarker development field to wake up and start developing useful biomarkers. While the article has absolutely nothing to do with bioinformatics (apart from a little reference towards algorithms in the final sentences), I know that many bioinformaticians are working in the biomarker development field.

Bladder cancer is currently monitored most effectively using cystoscopy – an invasive method, but one which is suggested to have a 95% sensitivity. One issue with bladder cancer is that there is an insiduous recurrence; and treated patients of often superficial cancers develop aggressive invasive disease, and this kills 50% of patients… The need for a biomarker is clear, with >95% sensitivity from a non-invasively sampled biosample, and patients would likely be more compliant with post-treatment follow-ups. The issue is reiterated several times that sensitivity of prognostic markers of disease progression is key.

Stick-based protein assays have been developed for analysis of urine samples, but suffer from <70% sensitivity – the author describes “betting lives on a test with worse sensitivity than the gold standard“, and further questions the value of the tests based on the fatal consequence of false-negatives and the cost of follow-up on the false-positives.

I am really happy to read the author’s dissection of microarray and proteomic-based biomarker discovery. The author acknowledges the naive nature of magically robust, sensitive and specific biomarkers from the results, and states the unpredictable nature of the homeostatic ripples that move outwards from a peturbation within interconnectded network of cooperating proteins. The promise of biomarkers is therefore dismissed with the statement that “the probability of finding a single biomarker with the requisite sensitivity and specificity is vanishingly small“. Does this mean that we can pack our bags instead and go home?

Fortunately not! Hurst instead argues that the combination of biomarkers from existing studies into practical panels is the way ahead instead of yet more studies searching for the elusive individual biomarker. With the acknowledgement that all cancers are largely unique, and that thousands of samples would be required to obtain robust samples, the emphasis should be placed on the selection of biomarker panels from small numbers of assays that are largely independent, but which are relective of the overall phenotype, and the historical approach of modelling causality within the system should be abandonned; the leads of the re-boot within the title! Most encouragingly the author also states that “the search for candidate biomarkers needs to be divorced from the validation in clinical populations” and advocates the development of biomarker panels in surrogate model systems with cancer patient specimens as a validative tool rather than a discovery tool.

This stuff is common sense, obvious and clear to bioinformaticians, but not always to the scientists and clinicians closer to the patient. This is a well written article and should be distributed widely; the final sentence really summarises it well “the intelligent development of biomarkers truly is a problem in systems biology.”

A general modular framework for gene set enrichment analysis

Tuesday, February 10th, 2009

Gene Set Enrichment Analysis or GSEA is one of those tasty methods that has been out there in the public domain for a number of years now. I guess that when most people see GSEA they immediately think of the original Gene Set Enrichment Analysis publication that was written by scientists from the Broad Institute. Earlier whilst investigating the contents of the BioinformaticsBlogLogs, it appears that GSEA is one of the technologies that still piques a cetain amount of interest. Gene Set Enrichment is one of the two most frequently searched terms (and only slighly ahead of “bioinformatics future 2009-”). While, perhaps to kill two birds with one stone, I should state that GSEA and related techniques are one of the futures of bioinformatics. GSEA is already a stand-alone tool, and enrichment algorithms are widely used in informatics solutions from the like of Ingenuity Systems etc.

It is therefore wonderful to find a well written article that compares and contrasts different enrichment methods, and proposes a framework for the further benchmarking of the available methods. As an applied bioinformatician it is all too easy to deploy a method without considering whether the statistic adopted really is best of breed.

Anyhow,

BMC Bioinformatics. 2009 Feb 3;10(1):47. Click here to read

Ackermann M, Strimmer K.

PMID: 19192285

 This article is well worth a read. The application of GSEA technologies within the field of expression profiling is discussed and the issue of multiple methods achieving the same task, and need for standardisation on methods and evaluation of the standardised methods is a clear point. The authors perform a meta analysis of the existing GSEA methods, analyse these methods within various simulations and evaluate the results. The overall finding is perhaps that GSEA itself may be an inferior method to a more simple univariate procedure, and that workflows relying on enrichment analysis may be simplified.

Bladder Cancer-Associated Gene Expression Signatures Identified by Profiling of Exfoliated Urothelia

Tuesday, February 10th, 2009

heatmap1.png

I am having a push do understand how we could be more effectively selecting candidate biomarkers from both our own proprietary datasets and from the wealth of public data that has been deposited within e.g. ArrayExpress or GEO. I have identified the following manuscript as being worthy of review and hope that you might agree with some of my feelings.

Cancer Epidemiol Biomarkers Prev. 2009 Feb 3. [Epub ahead of print]Click here to read

Rosser CJ, Liu L, Sun Y, Villicana P, McCullers M, Porvasnik S, Young PR, Parker AS, Goodison S.

PMID: 19190164 [PubMed - as supplied by publisher]

 The authors argue that bladder cancer, when detected early is largely treatable with a 5-year survival rate of ~94%; this is of course greatly hindered by the papillary tumours that invade the surrounding muscle tissues. Following surgery to remove the tumour, regular checkups are required to ensure that there is no tumour recurrence. These checkups are performed using cystoscopy – an invasive and rather unpleasant sounding procedure. There is thus substantial opportunity for the improvement in quality of life through the early detection of tumour recurrence using less invasive methods, with urine representing an ideal biosource type.

Urine cytology can be used for the diagnosis of new malignancy, albeit with rather low sensitivity and imperfect specificity. Existing protein based biomarkers have high false-positive rates and other development methods suffer from insufficient predictive power. In this manuscript, the authors rise to the challenge of developing a gene-expression based biomarker panel for the diagnosis of bladder cancer through the profiling of urothelial cells from bladder washes. From a panel of confirmed bladder cancer patients and apparently healthy (with respect to bladder cancer) controls profiling of amplified RNA was performed using the U133 plus 2 platform.

The resulting expression data was analysed within a framework of feature selection algorithms; the authors have previously described a machine learning approach that has been successfully applied within both breast and prostate cancers. The structure of the experimental data looks good, only one cancer patient profile clustered with the control patients within hierarchical cluster analysis. Pathway analysis software was used to map observations onto relevant biological context, and a 14-gene model was refined to build a classifier. The resulting classifier yielded 76% accuracy in cancer class prediction; apparently a reasonable feat considering the cytologic classification was only 35%.

The manuscript is certainly not presenting anything amazingly new, but is showing the application of existing technologies to demonstrate a proof-of-principal in meeting an as yet unmet medicinal need. The logical workflow within the manuscript is clear, the arguments are well presented and bioinformatics methodology is certainly acceptable. The paper, in my opinion, is noteworthy because it is not an open-and-shut case. There is plenty of room for improvement (18/20 cancer patients identified), so the derived panel suffers from both false negatives (a low rate) and false positives. The starting panel is heterogeneous with patients of mixed ages, sexes, ethnic backgroinds and  clinical characteristics. I see much more to be done here, and this has been real food-for-thought!