The Five Elements Of The Data Science Unicorn

By Alan Hylands

Number five tile

You pushing your Generalist agenda again?

I’ve written before that I believe that generalists are the real data science unicorns.

Especially when you are running a small company or early stage startup. Their mix of skills, and ability to roll with the punches, are invaluable as we get a never-ending tidal wave of rapidly changing data demands crashing over us.

A quick glance at job listings for data scientists immediately makes me balk though.

  • The never-ending shopping list of technologies.
  • The academic list of honours that would take anyone until they were in their thirties to complete.
  • The required list of side projects.
  • The open source contributions.
  • The burgeoning Github repository of polished data science portfolio projects.

Yes, we prefer generalists but how much do you really need to know, and have done, to qualify? Which skills are really the most important for our small company Data Science Unicorns and how do you know if you have what it takes?

The Five Elements Theory Of Small Team Data Science.

These are the five main elements I look for when chasing my own particular data science unicorn:

White PVC Pipes

The Plumber

Having knowledge of setting up and looking after relational databases is vital. Doesn’t particularly matter whether it’s SQL Server, Oracle or PostgreSQL. The underlying concepts and skills can be ported from one particular technology to another and this is a strand you will see coming up in some of the subsequent skill-sets.

Knowledge of SQL is therefore #1 on my list of must-have skills regardless of the seniority of the role I am recruiting for.

Knowing how to take data out of a database, change it and put it back in somewhere else is so fundamental as to be a given. Long term if you find that the arena of ETL, data management, data cleansing and aggregation is your chosen home then you’ll be prime material for becoming a Data Engineer.

Without those basic skills however, you’d be no use to me. I need people who can get the data, tidy it up and make it presentable. Both for themselves, the other analysts on the team and for our customers.

If you find that you’re a “Senior Data Analyst” and you still have to sit around waiting for other people to hand you clean datasets to work on, you’re virtually useless to me.

Magnifying glass on keyboard

The Detective

Inspector Clouseau. Lieutenant Columbo. Hercule Poirot. Sherlock Holmes. Jessica Fletcher. Nancy Drew.

What do they all have in common? A penchant for sticking their nose into things that often shouldn’t concern them which leads to them finding out the necessary details to bring a case (read: project) to a satisfactory conclusion.

We all need a touch of the data detective about us, especially those who err more on the analyst side of the business.

It’s not enough to just blindly trot out numbers in a presentation or results from a machine learning script. You have to want to question them and get under their skin.

Play good cop, bad cop. Even if it’s just by yourself.

  • Does this tell me what I want to hear?
  • What should I check to make sure that’s not a false positive?
  • Is the data quality good or just about good enough?

Wrangle the source data a bit more and see what the impact is. Take a data dump and run some analysis on it. Look for patterns.

This is often the unglamorous side of the job. But it’s absolutely essential to have some of these skills in your locker.

If you run a report that tells me we have an average Customer Lifetime Value ten times what I actually know it is, you won’t last long on my team. Knowing when something looks off is more important than having a photographic memory for every number that ever crosses your screen.

(Feed your inner Sherlock but if you find yourself in a series of American coastal towns and your close friends seem to keep getting bumped off just as you arrive, maybe lay off the Jessica Fletcher part a little.)

Man in dark working on laptops

Mr. Robot

You don’t have to be a cyber-vigilante or underground ethical hacktivist (whatever that is) but you really should be coming in with some kind of coding background. Too many aspiring data scientists get caught up on the specific language they should know to make them more hire-able.

“Should I learn Python or R for data science?” Yes.

It doesn’t really matter which you do first though.

It’s more important to pick up the basics of programming and get those concepts into your head than worry about getting too deep into one particular language or another. If you know just enough Python, odds are you’ll pick up the main concepts of R quickly enough too. Or SAS. Or VBA. Or {insert archaic language here}.

Big corporations in particular are slow to move off their legacy tech stacks. You might want to be working on the bleeding edge all the time but sometimes you’ve just got to roll with the punches.

I know one very senior analyst with twenty odd years of VB development who didn’t let his lack of SAS knowledge stop him moving to that language when the call came. He just used his SQL knowledge to run PROC SQL queries until he learned more about Data Steps.

Be a coding MacGyver. Use whatever you have to hand and keep morphing when the situation demands it.

Fairy lights on a book

The Storyteller

This is how you deliver your story to senior management and really unlock the power of the analysis your team has produced. Mixing the personal presentation skills of a seasoned TED Talker with the data visualisation chops of a staff writer on The Pudding.

It’s these skills that knock the old “if a data analysis project falls over in an empty boardroom, does anyone hear it?” question for six.

You might be old school and go with Powerpoint decks and Excel generated charts. (Just not pie charts please, we all have our limits and standards). Maybe you’re more into matplotlib in Python or ggplot2 in R. Or even D3.

Most importantly, you’ll understand what is going on in the numbers behind the visuals and be able to communicate that to what is likely to be a non-technical audience. You have to get the story out and winning over the room is often the most difficult part.

Conquer that and you’ll have a string to your bow that translates VERY well to progressing up the old career ladder to management, and the Executive bathrooms above that.

Man in dark in black suit and hat

The Double Agent

Maybe the most elusive element of our unicorn’s overall skill-set is the ability to play both sides.

Being on top of the technical side of the data science work, be it the data engineering, analysis or predictive modelling, is one thing.

Having enough of a finger on the business’s pulse to make it more than just another academic exercise with no real-world impact is quite another.

You’ll have to make both sides respect you as one of their own, while also appreciating that you carry enough kudos over from the “other side” to formulate and deliver actionable plans and projects.

So it’s not for the faint-hearted.

You can find yourself becoming the ping pong ball in the middle of inter-department politics. Batted from one side to the next and then back again with no real feeling of making any progress.

Dale Carnegie’s How To Win Friends And Influence People will be your best bedtime reading in that regard. You will need allies across the organisation to truly fulfil your mission while still maintaining enough deep knowledge of the technical side to not get left behind.

Speaking the dual languages of data-driven analyst insight and wily old gut feel business experience, she will be the linchpin on which all of the analytical hopes and dreams depend.

Bringing it all together.

And there we have it. Five elements. Five distinct skill-sets. Ideally spread in varying amounts around a small, tightly knit data team.

Finding all of them to a high level in one person is obviously nigh on impossible. If you do, you really will have found the mythical data science unicorn. Pay them extortionate amounts of money and ask them to recommend some like-minded friends.

If you are looking at it yourself as an aspiring, or current member, of the wider data science fraternity:

  • How do you think you personally measure up in each of those areas?
  • What could you do to beef up the areas you are weaker in?
  • How can you make the most of your strong points to get a bump up in your current job or a move to a new one?

And if you’re running a startup and are thinking it’s time to hire a data scientist but aren’t really sure what you are going to do with them - drop me a mail.

I’ll be glad to help point you in the right general direction.

(Main photo by Franck V. on Unsplash)