So there I was, sitting in my office
stuffing my face with food coding my heart out when a colleague walks in. “Ok Mr. Data Scientist, can you tell me the difference between a Data Analyst, a Data Architect, a Data Engineer and a Data Scientist? My client doesn’t understand what they are and frankly, I don’t know either”.
Yes, please come hither to my white board.
These Data People…
I’ve found that, like any other profession, data people are on a spectrum of skillsets. A Data Analyst is not the same as a Data Engineer and a data engineer performs completely different roles than a Data Scientist. I’ve played the role of all three thus far in my career and I know which ones I really don’t ever want to go back to…so…before you start slamming my comment section with your heavily-marketed word-vomit…lean back in your chair, read thoroughly and think critically. Really the client goes through “data phases” that depend on their “data maturity”.
You Might Be a Data Analyst If…
Ah yes, the Data Analyst. Also confused with Business Intelligence Analyst, is the origin of any data professional’s hard-earned experience. Be careful not to insult a Data Analyst with your software experience, they have soft egos and love their macros. Don’t get me wrong, I’m not hating on Data Analysts because I was there once. They are the front lines of the data profession. They’re where the client’s data story starts but hopefully not where the story ends.
Data Analyst Tools and Phrases to Listen for:
- Microsoft Excel formula connoisseur – “Index/Match is better than VLookup”
- Writes VBA on the daily – “Oh I just wrote a MACRO for that”
- Tells you the size of their data in rows – “I’ve got 200,000 ROWS of data”
- I need a bigger computer because my Macro takes 8 hours to run, it must be Big Data (probably not)
- Tableau – “Have you heard of this thing called Tableau?”
- May dabble with Access DB – be careful good sir, this is graduation into Data Engineer territory
A Data Analyst that can wizard her way around Excel may just be what the client needs right now. However, benefit us all and don’t refer to yourself as a Data Scientist when the only coding you do is VBA and create descriptive statistics. Play the long game and plan for your client’s data growth trajectory. Give yourself the opportunity to grow to the next level by asserting that your client needs the next level. That’s a win for all of us.
You Might be a Data Engineer/Architect If…
Once you have reached the point where your macros constantly see the “Out of Memory Error” in Excel, it’s time to become a Data Engineer. If you’re discussing with your team the need for Access DB instead of your VLOOKUP formulas then a Data Engineer (or Architect) is right for you. Data Architects are those people adept at storing data, writing queries and massaging ugly data into beautiful formats. Oh by the way, they tend to be more Computer Science-y than your Data Analyst. One thing you start considering is the true SIZE of your data, not total count of rows. The size of your data will determine what software stack the Data Engineer will need.
Data Engineer/Architect Tools and Phrases to Listen for:
- Queries, queries and more queries
- “Oh the back-end guy can take care of that”
- Probably has done software engineering for a number of years
- The buzz-word “ETL” makes it’s first appearance because it really is a process in itself at this point
- SQL, MongoDB, Postgres, Hadoop – all decisions driven by data size, formats and optimization choices
- Typically have lost all their hair or have massive glasses…. due to the queries
- These are the IT infrastructure people that will make your break you – nothing happens online without a Data Engineer/Architect
These data sauvants should be highly paid in your organization and try not to piss them off or DROP TABLE will be the last query they write. Depending on how your organization uses their Data Analysts and Data Scientists, both of those team members will be begging Data Engineers for data pulls. I spent a very short time as a Data Engineer, I have the bald spot and glasses to prove it.
You Might be a Data Scientist If…
When all you care about is whether or not there is labeled data, your best friends at work are the “back-end” guys and everyone calls you R^2, then you have become a Data Scientist. If your mission in life is to gather more data to test, don’t care the size nor the format and can still communicate business decisions for the future then you have finally arrived. If your client is typically high-level in the organization because they “want to see into the future with their data”, then you are worth your salary.
Data Scientist Tools and Phrases to Listen for:
- Pandas, pandas and more pandas
- The target variable for this experiment was…
- Confusion Matrix, R^2, recall, etc
- Python or R
- Uses Spark for distributed computing – “Um, my computer ran out of memory, need more power!”
- Deep Learning – this is usually where people’s eyes glaze over
There are many buzz-words out there about Data Scientists, but don’t let them fool you. We all have the same arrows in our quivers in the open-source when it comes to machine intelligence. Supervised or unsupervised, in truth it is really about the proper framing of the problem. It’s about acquiring the right data that describes the problem rather than about which algorithm you use. You will find Data Scientists a bit frustrating if they don’t have proper data to use and you won’t see their ROI unless your organization had a data strategy to begin with.
Collateral Damage – People Will Always Fear for their Job Security
Buzz-word alert, “Automation will replace my job”. Sure it can happen but will not always happen. If your job can be automated you should ask yourself what tangible value you’re bringing to the organization in the first place? Most people want to keep their data growth at the Data Analyst phase because that ensures they are always needed to provide their input, often via Excel. “Hey Henry, will you send me the spreadsheet with next year’s budget numbers?”. See, job security is more important than efficiency and progress.