Archive for the ‘Implementing C methods in R’ Category

EMAAS: An extensible grid-based Rich Internet Application for microarray data analysis and management

Monday, March 23rd, 2009

G Barton et al., BMC Bioinformatics 2008, 9:493doi:10.1186/1471-2105-9-493

emaas.jpgEMAAS is another environment for handling and analysis of gene expression data. The authors have set about the development of a distributed e-support system for the management and analysis of microarray data; to provide access to complex methods and to apply (from a biologist’s POV) non-trivial technologies to handle large multi-variate datasets.

Whilst other solutions have missed the point and taken an easy approach to solving the problem, the EMAAS approach is rather more complicated and relies instead on integration of internet accessible tools, standard statistical packages (R/Bioconductor) and web-resources (CELSIUS, GEO). The decision to aim for a modular and flexible framework is excellent and makes this in my opinion a very much more interesting project. The completeness with which tools and environments has been included is breathtaking; the depth of IT and analytical platforms required is rather daunting.

In contrast to the manuscript reviewed in the last post, this resource’s source is available under a suitable GPL license, and some of the demo server also works. I have some problems with the resource (Flash for a start), but this is one smooth implementation and is packaged in such a way that I could take it for a spin if I so wished!

This manuscript is heavy to read, but a damned fine resource is described underneath the technical fluff. This is a great resource and this earns a great recommendation from the bioinformaticsblog.

Bioinformatics, C, R and a learning curve

Thursday, February 5th, 2009

ansiccard.png

I often tell people that one of the greatest things about bioinformatics is that there isn’t too much dogma or established process to hinder us. Computational biology is a rather new discipline and when we consider technologies such as gene expression profiling or high throughput protemomics or metabolomics the technologies are really less than a decade old. Industry likes SOP, best working practices and a rather documented and robust approach to getting things done, so bioinformatics is often a breath-of-fresh-air within a controlled working environment.

At the moment I am having a great lungful-of-newness (that looks wrong), I have established processes for computational analysis of certain key data types, and am now more involved in the processes associated with the formalisation of processes and client::server transitions. It is therefore wonderful to go back to the some of the rate limiting steps and to reevaluate the implementation of more optimal C methods in-place of often clumbsy and inefficient native R routines.

What is a bioinformatician – as someone who has spent the last decade working only in bioinformatics I feel that I have many of the necessary traits – I have written BLAST parsers, conquered object oriented programming, written production code in perl, python, ruby, java and am quite happy to implement packages in R. I am now delighted that I can add some basic C to the mix… C has been challenge in the past – informatics naysayers have suggested that it is too hardcore for a biologist, that you need to write too much to get something done, but Hell, it works, is fast and for something simpler than a full blown OS really rocks.

At the moment I have my NCBI taxonomy parsing code in development. We are beyond the crisis from yesterday morning when 1+2 != 3, but can now use .Call methods reproducibly and can pass R matrix objects (unidirectionally) to C code. While not quite ready for the cigar, we are at least heading in the right direction and I feel that this has been pretty good usage of other dead train time. I guess that 9 hours a week on the train is good for innovation.

I would now start to argue that you newer and younger bioinformaticians, when fighting with performance issues in R consider diving into some C code – it is really not that painful!

R/bioconductor and methods implemented in C

Tuesday, February 3rd, 2009

rcint.png

This is not the easiest thing for a C novice to do, and I am discovering a lot as I go, but I feel that we have made really excellent progress over the last 24 hours and the method is looking almost feasible. We can compile code from within the R package, we can associate the library at package loading time and can call the method properly. I seem to be failing at a point that may be pretty close to the final frontier …

I am dead pleased – an academic itch has been scratched and something useful has come of it. The next challenge is to get this final problem solved and then to document the complete workflow within a tutorial. Let’s hope that we can manage some or all of this on the train tomorrow morning!

Adding numbers in R – the hard (-est possible) way

Tuesday, February 3rd, 2009

electronic_calculator.jpg

I have had a pretty good think about my NCBI taxonomy in R issue, and there is only one (creative) way to go. I have had a pretty good read around to see what exists in the way of tutorial for R / C integration. Unfortunately at the moment the  answer is not a lot apart from some .pdfs and packages from Dirk Eddelbuettel and within the R documents for writing R extensions.

To get the ball rolling and to start a neat tutorial to inspire both myself and some of my more loyal readers, I am going to create a new function in R, through a package specifically dedicated to the task of adding two numbers. Kind of easy in R, but packaging a C method within R is a little more complicated, but should help me learn enough to look at the problem in a new and more creative way.

#include <stdio.h>
adder(int a, int b) {
int c;
c = a + b;
return c;
}

void main()
{
int a,b,r;
a = 10;
b = 5;
r = adder(a,b);
printf(”\nHello World\n%d\n”, r);
}

My first C code for the project, my first C code (other than earlier code fixes of other people’s software) and hopefully a beautiful challenge ahead.  I should probably be looking forwards to the commute to the office tomorrow with added enthusiasm? I don’t think that anything here needs documenting (yet), but now I think that we need to get this code compiled within an R package, and even integrated into the package …