There are 2 types of Data Scientist.
Those with grey hairs and those without.
If you're one of those with greys (like me) then you didn't do a specialised Data Science course (they didn't exist before I got the greys) and you probably fell into Data Science by accident.
Interestingly, todays highly sought-after Data Scientists were yesterdays unloved academic 'jack-of-all-trades', with the unfortunate epitaph of 'master of none'.
No more - Data Science is finally getting the recognition as a specialist subject in its own right and Data Scientists are finally being seen as valuable commodities.
Anyway, here's my story - see if it rings any bells with you...
Unless you’ve recently graduated from one of the new Data Science courses that have been popping up online and in various universities around the world, then becoming a Data Scientist was most likely slightly accidental and was more about the journey than the destination.
Here’s my journey. See if you recognise any of it in your own:
I started out as a physicist and had a strong mathematical background, but I had a passion for medicine. After completing my bachelor’s degree I took a master’s degree in medical physics. This is where I gained an appreciation for the importance of image analysis and the role that data plays in medicine. I created a virtual model of a human torso by segmenting images from the Visible Human Project. Each slice had dimensions of 2048 x 1216, each in 24 bit colour, which is approximately 7.5 megabytes. Not too large, but when you put all the slices together, the full dataset is around 40 gigabytes. This may not be in Big Data territory, but it’s pretty big for a desktop PC and you get quite familiar with handling large amounts of data.
Incidentally, there are no shortages of blog posts talking about the necessary skills of Data Scientists, but very rarely does anyone mention image analysis. I predict that image analysis and video analysis will shortly become a very useful skill for a Data Scientist to have, not just in medical data analysis, but in many other areas of data analysis too.
Anyway, I digress.
After my master’s degree in medical physics I then did another master’s degree in bioinformatics. During this time, the results of the Human Genome Project were published and I was honoured to be able to do some analysis of the resultant data. The Human Genome Project produced huge amounts of data, so my newly-discovered data handling skills came in very handy. Here I learned about artificial intelligence and created a number of predictive models for a variety of purposes.
At the end of my master’s research I did a PhD in artificial intelligence where I created a predictive system that prevented a terrorist attack on a public water supply. Well, actually, that part isn’t strictly true. I wrote an article that was published in New Scientist about how an artificial neural network system could be created that would prevent a terrorist attack on public water supplies…
Now here’s where my journey comes full circle. At the conclusion of my PhD I left bioinformatics and returned to medicine where I was offered the role of medical statistician to one of the worlds best breast cancer research departments. I wasn’t appointed because I was a statistician, but rather because I wasn’t a statistician. Although I had a working background in stats, they were more interested in using my skills as a bridge between disciplines. I was neither a specialist in microbiology, pathology, cancer, surgery nor stats, but I had sufficient working knowledge of each to be able to communicate and translate effectively between them all of them.
It was a really interesting time, but I realised that I didn’t actually like stats. What I did like was programming stats. Most of my time as a medical statistician involved creating programs to automate data analysis, stats and predictive systems that helped researchers reach the story of their data in a fraction of the time that it would take to analyse the data manually.
And that sort of brings me to where I am today. A few years ago I left my job to form a start-up company, Chi-Squared Innovations, that creates automated data analysis programs, but that’s a story for another day.
OK, so that is my story, but there wasn’t really a destination. I didn’t actually plan all of that out, it just sort of happened. I think the journey is an important one, because it tells you a lot about what Data Scientists are all about, and the skills they use every day.
I started out as a scientist, and have worked in many different scientific fields, but I’m not a specialist in any of them. I learned a lot about computer programming, data handling and image analysis, but I don’t specialise in any of these either. I guess my strongest areas (at the moment) are in artificial intelligence and statistics, but I don’t claim to be an expert. Right now I’m working on improving my skills in business development, data visualisations, shell scripting, python and GUIs, but – yes, you guessed it – I’m not an expert.
For me, this journey typifies the life of the Data Scientist. Most of us aren’t experts in more than one or two disciplines (or any, in my case), and to the traditional academic we are ‘jacks of all trades’. Our skills are neither that of the expert nor of the novice, but somewhere inbetween. Neither black nor white, but varying shades of grey.
What we need to recognise though, is that Data Science – as broad a subject as it may be – is a specialist subject of its own. To me, Data Science is the glue that binds together distinct areas of specialisation. It is the ultimate multi-discipline.
Here’s an unfunny joke I used to tell when I was still an academic:
Q: What do you get if you put together the best physicist, mathematician, biologist, surgeon, programmer, statistician and AI guy in the world into one room?
A: An unholy mess, the potential for 1000 arguments and the waste of $50 million.
Of course, this is exactly the type of multi-disciplinary dream team that universities, government bodies and companies set up regularly and call it a ‘think tank’, so why does it often fall apart?
The answer is because there is no glue. Each specialist is trained to see the problem from their own perspective and has little knowledge and understanding of other points of view.
This is why Data Scientists are becoming so important. They are the glue that pulls together disparate disciplines.
Oh yes, and to those that say that all you need to do to be a Data Scientist is to do an online course to learn Hadoop, MapReduce, R, Python and d3 I say this: it’s about the skills, not the tools. To learn the skills of a Data Scientist takes years, if not decades. If you don’t have any grey hairs yet, then you’re not a Data Scientist (but don’t give up – you’ll get there eventually)!
So to all Data Scientists the world over: stop using the Grecian 2000 and celebrate the grey.
All 50 shades of them…
So what was your journey to becoming a Data Scientist? I’d love to hear your story. Just lie back on the couch and tell me all about it…