Archive for the ‘blah blah’ Category

Bioinformatics, backups and disk disasters …

Thursday, March 5th, 2009

broken_mac.jpg

I guess that as a bioinformatician and as someone who works hard to stress a computer that failures should be part of the deal and something that we can deal with. I guess that hardware failure and software failure are part of the rich cycle of life? Within the last year I have had a completely failed RAID system (thanks LaCie, there went 1.5TB of disk space and several hundred GB of data that needed to be recovered), a Levovo laptop that now communicates not with projectors, batteries or disks and a failed disk on my wife’s ancient Vaio. Yesterday on the train the disk on my *new* MacBook Pro gave up the ghost, some C code was compiling (that Taxonomy project again ;-) ) and it just sort-of waited and nothing happened.

Last night I tried all possible routes of disk disaster recovery; I cannot mount the disk using target mode on other macs, DiskWarrior refuses to even look at it, and with some overseas travel coming up a week without a fit-for-purpose computer is looking inevitable. I know that hardware fails, but why don’t I keep backups? Sure, all of my code is kept with an SVN repository, datasets are typically mirrored across different computers, but a load of stuff like photos and iTunes lived only on the laptop.

Apple computers are pretty good, pretty smart and make life rather easy. I think that I really should get a TimeCapsule or an external disk so at least I can start routinely copying the valuable parts of my computational existence. We have the information management organisations in industry who make sure that we can’t waste our time or lose our data and establish meaningful processes. Why can’t I learn from their example?

Now to find the time to buy a new disk, a backup disk and start the slow process of recovering what may or may not be recoverable!

I’m a Steve, Trust me!

Wednesday, March 4th, 2009

steves.jpg

I have been accepted as a Steve by the National Centre for Science Education. I am now an official card carrying member of the group of people who accept that evolution is real! Looking down the list there are several familiar bioinformaticians, molecular biologists and geneticists – a confortable place for me to be!

Bioinformaticians and vacations …

Tuesday, March 3rd, 2009

ski_vactaion.jpg

I am back in town, alive, relaxed and fit after a pretty good winter vacation. Two weeks with minimal office intervention has been pretty good for mental health, and I do feel quite a lot more relaxed than I have for quite a while. Sure, there is now an even more massive backlog of work, reports to complete and data analyses to push, but hey, I’m feeling good. Over the next couple of days I have some pretty good literature reviews to push, I have good updates and a working prototype of the R-taxonomy package and will soon prepare for some online and dynamic meeting reports …

We are now into March and I have scrutinized the server logs from the blog. We have mad pretty impressive leaps in terms of readership during the last month, and are getting closer and closer to the threshold imposed for establishing a new skin for the pages. We should also have a look at some of the product developments from our current sponser, Mnemosyne BioSciences.

Windows / applications open on your computer

Tuesday, February 10th, 2009

windows-31a-screen-shot.jpg

As an avid reader of Slashdot, I am amused to read this morning their comments on the forthcoming iteration of the horror that is called Windows. Apparently the next round of Windows-lite will allow for 2 open applications, but according to different sources the average windows user has 8-15 windows open. Check out the Slashdot post here!

As a bioinformatician I feel that I am a little more talented than the average windows user; perhaps not a windows power user (since I choose not to use Windows when possible), but a quick look at my Windows desktop shows that I have 9 applications open (Excel, R, Terminal, Outlook, IE (yuk), Tectia SSH, Acrobat, Endnote and Firefox) and a total of 27 open windows. This is all within the limitations of Windows and I hope is acceptable usage.

This begs questions as to how we all work, how we access and collate information. On the Unix box that I can work on (almost fit for purpose corporate approved RH install) I have fewer running applications (Eclipse, Firefox, Netbeans, shell and R), but many more windows with open R, bash, python and remote screen connections. Some of these connections are to remote MySQL databases, internal PostgreSQL databases and distributed resources such as Biomart.

The thought of having to work with just 2 windows or applications is a little scary – do you close applications when you are not actively using them? I guess that there is some advantage of the corporate view – at least we’ll eventually get something shiny, polished and largely stable – butr certainly well tested. I still like my MacBook Pro though. With Spaces I have 4 desktops (this is sufficient) and within each we have different themes (java / R / mail and web / documents) – plenty of running applications and a flurry of windows.

How do you all work? What are your practices? Perhaps we should even question as to how your desk appears, and how your desktop looks?

Bioinformatics, C, R and a learning curve

Thursday, February 5th, 2009

ansiccard.png

I often tell people that one of the greatest things about bioinformatics is that there isn’t too much dogma or established process to hinder us. Computational biology is a rather new discipline and when we consider technologies such as gene expression profiling or high throughput protemomics or metabolomics the technologies are really less than a decade old. Industry likes SOP, best working practices and a rather documented and robust approach to getting things done, so bioinformatics is often a breath-of-fresh-air within a controlled working environment.

At the moment I am having a great lungful-of-newness (that looks wrong), I have established processes for computational analysis of certain key data types, and am now more involved in the processes associated with the formalisation of processes and client::server transitions. It is therefore wonderful to go back to the some of the rate limiting steps and to reevaluate the implementation of more optimal C methods in-place of often clumbsy and inefficient native R routines.

What is a bioinformatician – as someone who has spent the last decade working only in bioinformatics I feel that I have many of the necessary traits – I have written BLAST parsers, conquered object oriented programming, written production code in perl, python, ruby, java and am quite happy to implement packages in R. I am now delighted that I can add some basic C to the mix… C has been challenge in the past – informatics naysayers have suggested that it is too hardcore for a biologist, that you need to write too much to get something done, but Hell, it works, is fast and for something simpler than a full blown OS really rocks.

At the moment I have my NCBI taxonomy parsing code in development. We are beyond the crisis from yesterday morning when 1+2 != 3, but can now use .Call methods reproducibly and can pass R matrix objects (unidirectionally) to C code. While not quite ready for the cigar, we are at least heading in the right direction and I feel that this has been pretty good usage of other dead train time. I guess that 9 hours a week on the train is good for innovation.

I would now start to argue that you newer and younger bioinformaticians, when fighting with performance issues in R consider diving into some C code – it is really not that painful!

Monday mornings – a druggability target?

Monday, February 2nd, 2009

officestressed2-main_full.jpg

Monday is one of the important landmarks within the working week. Typically I love Mondays, and at morning coffee then I have a little more enthusiasm for the tasks at hand than some of my co-workers and colleagues. The heterogeneity within the Monday morning response is obvious and there appears to be a reasonable distribution; some people love mondays, some people hate them and there appears to be another set of indifferents. I would imagine that a quick population survey and some genetic analysis would identify a good collection of not completely unexpected candidate genes for the monday morning phenomena.

My favoured genes for the monday morning analysis would certainly include 5-HT receptors, and dopamine receptors; both 5′HT and dopamine feel rather lacking in my own grey matter this morning. Whilst composing this message I am further reminded of the more complex markers of a “bad” monday morning – not all mondays are created equal!

The train this morning was rather quiet, but a Finnish muppet was sitting in my seat asleep. Posed with what to do – kicking her out of my assigned seat would lead to a flood of catecholamines and undoubtedly some bad karma, leaving her alone would not really hurt  (there are plenty of other seats) but I would forfeit the coffee comfort that requires 2 seats (one table for the coffee and one for the laptop). I went down the passive route and watched in abject horror at the lack of finnesse that this traveller possesses on a commuter train! I arrive at work and the coffee machine has been emptied of coffee – there really is no substitute for the urgently required trimethylxanthine, and I can only now being to query as to whether the fugly on the train should have been relocated…

Mailbox is looking busy, must go on a hunt for some methytheobromine containing beverage first! Let’s see how the day progresses and if any “monday morning recovery” markers appear?

A month of the new BioinformaticsBlog

Sunday, February 1st, 2009

progressbar1.png

A month in to the new project and we have some pretty good data as to site usage and growth. The site is naturally growing (it really shouldn’t be shrinking) and there have been some surprisingly good days. 261 unique visitors dropped by during the month and that is a good start. 404 not found has been a pretty well visited page; and I have accounted for most bandwidth usage.

I feel that the blog is worth it, it is largely cathartic for some of the open bioinformatics software development projects that I run and has provided a little inspiration for the tasks at hand. As with much of bioinformatics, community involvement has thus been a little disappointing. No new users have joined the site and all comments deposited have been through engines rather than real bioinformaticians in the field.

Bioinformatics is my life – what could be better than computers, IT, biosciences and biostatistics. The BioinformaticsBlog is here to stay, though we are still a little off the next development milestone that will demonstrate to me that the BioinformaticsBlog is a viable project.

Anyhow – BioinformaticsBlog – one month new, lots of posts and sufficient community apathy. I’m pleased. Happy New Month!

I need development help …

Thursday, January 29th, 2009

linuxbanner.jpg

I love rPath and rBuilder, I love bioinformatics and computer clustering is something that just rocks. I am now stuck in a rather uncomfortable place where I can move neither forwards nor backwards. The problem is I think quite simple. I wish to have native linux clustering within bioram-linux.I have looked at the Condor solution from University of Wisconsin. I have used this for a long time and it can be made to install.

Building (condor) from source is however quite beyond me, and there are a load of compile errors in my environment that I just can’t get beyond. A pre-compiled binary is therefore perhaps an easier option, but I have run into a slew of broken dependencies … I guess the easier path of using a standard linux distribution would be easier …In reconsidering the logic of using Condor I have gone back to the (binary distribution of) Sun GridEngine.This looks very much easier to deploy, but has some problems of its own … how can I establish a meaningful and working environment at build time, or do I do something truly ugly (and hard deploy a de-facto working environment).

Running a bioinformatics development unit as a hobby is sometimes more than I’m for (oh right, I’m not paid …), and sometimes it would be good to speak with some people who have an idea about what to do and how to do it. Are there any volunteers out there?I am certain that we can have a working GridEngine 6.2u1 in the BioRAM by the end of the day – it’ll be subject to certain requirements (second network card hard coded to address of 10.0.0.1), but I guess that I can live with it, and I guess that you will all have to live with it too…

What should the focus of the blog be for the next four weeks?

Friday, January 23rd, 2009

question-mark.jpg

The bioinformaticsblog is being read, and looking at the logs shows me what is popular and what is not. Loads of hits are coming from Google and it is pretty interesting what you all seem to be hunting for! At the moment the mail search themes surround my comments on NetOffice (I still love it), iPhone (still thinking about feasibility) and public datasets (especially the GSK cell line data).

I have put a huge effort into the bioram-linux plan and have a pretty neat system running inside a virtual machine. It has the latest R with a load of bioconductor and cran packages. It has software such as trace2dbest, phred, cross_match (not publicly available from rPath.org yet) for an EST project running through the lab at the moment and I am working of certain “biomarker” packages, that will demonstrate my control of destiny rather than the other way round! This is where I wish to concentrate for the time being.

My blog focus until the end of Feb 2009 will be as follows

  • BioRAM-linux, rPath, rBuilder and conary; towards a functional customized bioinformatics cluster OS
  • bioinformatics and iPhone – towards XML integration of iPhone and Mnemosyne LabManager server
  • tutorials in comparative genomics using ‘R’ and ‘bioconductor’
  • An XML framework for Agilent array analysis using ‘R/bioconductor’
  • Papers-of-the-week

This is my plan, but I really need to have your input here (mail me, or preferably leave a comment)

Luck clusters

Monday, January 19th, 2009

clover.jpg

Back into the start of the week. Monday morning is one of my favourite times of the week, but I am feeling exhausted today!

Is there such a thing as a bad luck cluster. Superstition suggests that bad luck tends to come in 3s – is this correct? My wife and I have been in our silly little luck-&%$# cluster over the last few days. Some silly paperwork problems with the new car led to extra work (for her), pointless trips to bureaucratic offices, missed work and rather more unexpected chaos and unplanned modifications to front of her car (and to unfortunate third party). Karma associated with accidents is always bad, but then I run into chaos with my own paper affairs, the roof on the shed has started disintegrating in the cold (and wet) (the shed is new!) Other silly things are happening, and it all somewhat dampens the mood.

Anyhow, weekend is over and spent a little time thinking about some of the issues I have with distributed ‘R’ and standardization of packages. I am back into virtual machines and rPath is back in action and consideration. I will endeavour to write the first review of rPath for Bioinformatics later in the day (on the train home perhaps).