
I am having a push do understand how we could be more effectively selecting candidate biomarkers from both our own proprietary datasets and from the wealth of public data that has been deposited within e.g. ArrayExpress or GEO. I have identified the following manuscript as being worthy of review and hope that you might agree with some of my feelings.
Cancer Epidemiol Biomarkers Prev. 2009 Feb 3. [Epub ahead of print]
Rosser CJ, Liu L, Sun Y, Villicana P, McCullers M, Porvasnik S, Young PR, Parker AS, Goodison S.
PMID: 19190164 [PubMed - as supplied by publisher]
The authors argue that bladder cancer, when detected early is largely treatable with a 5-year survival rate of ~94%; this is of course greatly hindered by the papillary tumours that invade the surrounding muscle tissues. Following surgery to remove the tumour, regular checkups are required to ensure that there is no tumour recurrence. These checkups are performed using cystoscopy – an invasive and rather unpleasant sounding procedure. There is thus substantial opportunity for the improvement in quality of life through the early detection of tumour recurrence using less invasive methods, with urine representing an ideal biosource type.
Urine cytology can be used for the diagnosis of new malignancy, albeit with rather low sensitivity and imperfect specificity. Existing protein based biomarkers have high false-positive rates and other development methods suffer from insufficient predictive power. In this manuscript, the authors rise to the challenge of developing a gene-expression based biomarker panel for the diagnosis of bladder cancer through the profiling of urothelial cells from bladder washes. From a panel of confirmed bladder cancer patients and apparently healthy (with respect to bladder cancer) controls profiling of amplified RNA was performed using the U133 plus 2 platform.
The resulting expression data was analysed within a framework of feature selection algorithms; the authors have previously described a machine learning approach that has been successfully applied within both breast and prostate cancers. The structure of the experimental data looks good, only one cancer patient profile clustered with the control patients within hierarchical cluster analysis. Pathway analysis software was used to map observations onto relevant biological context, and a 14-gene model was refined to build a classifier. The resulting classifier yielded 76% accuracy in cancer class prediction; apparently a reasonable feat considering the cytologic classification was only 35%.
The manuscript is certainly not presenting anything amazingly new, but is showing the application of existing technologies to demonstrate a proof-of-principal in meeting an as yet unmet medicinal need. The logical workflow within the manuscript is clear, the arguments are well presented and bioinformatics methodology is certainly acceptable. The paper, in my opinion, is noteworthy because it is not an open-and-shut case. There is plenty of room for improvement (18/20 cancer patients identified), so the derived panel suffers from both false negatives (a low rate) and false positives. The starting panel is heterogeneous with patients of mixed ages, sexes, ethnic backgroinds and clinical characteristics. I see much more to be done here, and this has been real food-for-thought!