How To Eliminate Stupid Mistakes In Your Data Reports
By Alan Hylands
Mistakes. I’ve made a few but then again, too few to mention.
If only the second part was true…
Big mistakes, small mistakes
I don’t mean the huge life altering mistakes of taking the wrong job or not jumping at a great opportunity when it comes along. More the day-to-day mistakes in a data analyst’s life: the incorrect reporting, the poorly cleansed data, the completely inaccurate analysis.
It’s unlikely that one incident of rushing a job out the door will completely derail your career. Over time though, the death by a thousand cuts of making small stupid mistakes can seriously dent your own confidence in your ability. And more damaging to your career prospects, your manager’s belief in your skills.
The blossoming cold sweat of realisation that you have f*cked up
We’ve all felt the heart stopping terror of realising our report to senior management is as useful as a soiled paper napkin full of randomly generated nonsense. And even worse: IT’S ALL YOUR FAULT.
If that doesn’t call for an immediate change of trousers and a blast on the defibrillator then I don’t know what does.
You mightn’t have done anything as reckless as deleting an entire production database. (Been there, viewed that). But there will be plenty of opportunities in the course of your analytics career to feel the sheer panic of the Fuck Up Moment. Even worse is the sphincter tightening realisation that you’ll have to ‘fess up before you get found out the hard way.
Over the years I’ve worked through some techniques and strategies for alleviating this potential situation as much as possible.
Check Your Own Homework
We all know what a wink and a nod process it was at school if we got to mark our own homework. Last minute “editing” of answers. Judicial use of discretion on what you meant to say rather than what was actually written on the page. Always finishing with a lot of top marks for teenage kids who didn’t really understand the whole concept of cheating themselves by fiddling answers.
Hopefully you’ve moved beyond that stage by now. The only way to cut 80% of errors out of your reporting is to be your own homework checker from hell.
Plan out your process
Set out a work flow process before you begin. Document it on paper or in a text file and plan your high level approach. Write down your data sources. Set out what you understand the problem statement to be. Figure out where you want to get to in the analysis and what you will expect to see at the end.
Keep a data journal
I’d recommend keeping notes as you work through each part of the process. If you found a new data field to use, document what it is, what the possible values are and where you found it. Memory is not reliable and you are not Rain Man. I’m certainly not, especially as I get a little older and try to cram more and more into my head. Write it down. You’ll thank yourself 1 month/6 months/3 years from now when you need to revisit the same thing again and have only the vaguest memory of what you did.
Write tests into your code
For example, check if the output after a SQL join statement has more or less rows than you expected from your base tables. Are there duplicates that need handled early on rather than a full scale battering at the end?
Write comments in your code
Again, you’ll thank yourself. I’ve lost count of the times I’ve re-visited old code and thought “what the hell have you done here?” Your colleagues will thank you even more when you jump ship and they get handed your code to run. Pay it forward, even from a selfish perspective.
Create a validation tab on your output spreadsheet
Run some counts and summaries. See how this matches up with other numbers your team produces that you know are gospel to the senior management. Top KPIs like number of customers by product, average holding, product penetration. If you don’t have a regular cheat sheet to check these off against then make it a priority to put one together.
Eyeball the raw data
Know your business. Know what “good” looks like in terms of how the data should have been entered. If something looks wrong at the data sourcing stage then get it sorted there and then. Shit In, Shit Out will remain the data analyst’s mantra until the end of time.
Ultimately though, it’s all down to you, yes you
I understand that time pressure and people banging on doors often leads to corners being cut. This can lead to catastrophic consequences. What you need is a set process that YOU follow to ensure the analyst making the mistake isn’t you.
If your manager doesn’t formalise the personal QA process then just do it yourself. Sometimes structure is good. You don’t have to be the freewheeling, creative, seat of your pants genius of the analytics world on every job. Sometimes being slower and steadier is ok too. Just not too slow…
We all need some guidance from time to time
Your manager should have more formal processes to follow to ensure that they are comfortable with the output you are shepherding out into the world. This will include formal Peer Review, Subject Matter Expert Quality Assurance and Senior Manager Oversight. I’ll cover these off in a separate post for the managers amongst us.
As an analyst you should want to be responsible for the quality of the code you write and the output you deliver. The steps above should become second nature. Even if you don’t have to use all of them every time, having them up your sleeve helps close the door and that’s what matters.
What If I Did Make A Mistake?
We all have. Lots of them. Years in the job do not act as a repellent to stupid mistakes. Processes to help avoid them do not always catch 100% of them either.
Accept this. You are not alone.
What I will suggest is to learn quickly the most important takeaways from the whole situation:
- Own your mistakes.
- Be upfront with your manager.
- Work out quickly how to fix them.
- Learn from the mistake for next time.
Confess your sins
Integrity is everything. Do not go for the short-term option of hiding the mistake and hoping it will go away.
It won’t.
What will go away is your team and manager’s confidence in your ability and character if you cover it up.
As a manager, I want to be told about mistakes quickly and completely.
Learn from your mistakes
I want to see learning so even if a mistake happens again in future, it’s not the same old mistake over and over and over again. There is no excuse for that.
If you are paddled every time the shit hits the fan though, maybe it’s just a bad environment to work in. The main learning you need to take there is that you should start looking elsewhere.
Otherwise, pick yourself up, fix the problem and learn from it.
As a manager, that’s all I could ask of you.