A computational biologist’s personal toolbox : What a scientist will really do with programming
Here’s a great piece to read when wondering about the questions, “Do scientists really all need to learn to program? Surely they’re not going to program are they? What would they do?” What they’ll do is patch together piece of other’s code, with lots of data transformation. What do they need to know? A robust mental model of how the modules work and what the data needs for each are. This is beyond computational thinking.
In my past 20 years as a programmer, I’ve seen the rise of object-oriented programming and ‘modularity’ is something that was hammered onto my forehead. These days, I organise my entire life as a computational biologist around little modules that I re-use in almost every workflow. Yes, sure, you may call me a one-trick pony, but in terms of productivity, call me plough horse.
My core modules are ACQUISITION, COMPUTATION, VISUALISATION, and usually I glue those together with a few lines of Perl or the Unix command line. Here come the constraints again: To overcome the limitations of the software that I’m often “misusing”, I use my own scripts to shove data from one format into the next, and back again. I think every biologist who deals with lots of data, not only us computational folk, should know a few handy lines to quickly turn comma-separated files into tab-delimited, strip a table of empty quotes or grep some essential info.