PS. There will be a lot of memes in this post.
First things first, I love my job. I think data science is a fantastic field specially for those of us who came from STEM fields. Nevertheless, there are indeed some very confusing, very awkward moments in the field that need to be addressed and I haven’t really seen enough discussions about them.
Why don’t we start from the beginning. What is a data scientist? In simple terms a data scientist is someone who has knowledge in (1) statistics (2) programming and (3) business analytics and can use these skills to develop sophisticated models to solve complex problems. It really is an incredibly cool field: I’ve worked on retention models, analyzing employees and customers’ open text data to try to understand what they feel and what they from their employers, etc. I mean it’s fascinating. However, the reality is that things are not always like that and a lot of data scientists end up switching jobs quite often because they feel they don’t get to do what they were hired for. Below I’ve described some of the reasons why in more detail.
(1) Misconception about what a data scientist is/does. To-may-to, To-mah-to, right?
Nowadays most businesses want data scientists and while I whole-heartedly agree that data scientists are very valuable, businesses don’t always know what do to with them or they aren’t even sure what they really do. As a data scientist I have had to do front-end development, back-end development, database management among other hats. None of these should be the responsibility of the data scientist. If you want someone that can do front/back end development, you should hire a web developer. Just because data scientists can code, it really doesn’t mean they are qualified for web development. Data Scientist != Web Developer. You are setting the data scientist and your team up for failure.
(2) Often data scientists are a one person team
You go for an interview, they tell you they have all this data and they are ready for you – they’re not. More often than not baby data scientists are thrown into the sharks either in startups or corporations in one man/woman teams to just figure it out. There have been occasions where I have gotten to a role and there is no data for me to work with so I have to figure out how to survive in that environment, e.g. convincing the business to track data and set up data standards.
Not only that, by isolating the data science “team” it is hard to push data-driven insights when no one is really looking at what this one single person is doing. Data-driven insights are great as long as there is an actual effort to move the project forward, produce grassroots from it and have data science teams collaborate directly with the business and development teams.
(3) Us vs. them dilemma
Imagine this scenario:
- Client: I need to know, or make me a forecast of what turnover will be in 5 years and why are our customers not using my service.
- Data Scientist: Sure, that makes sense, but I just have one problem.
- Client: What’s the problem?
- Data Scientist: You’ve only provided me with one month of data. I can run a model just fine, but it will not give you an accurate forecast.
- Client: Well, that’s all we have and it has to be a good model, so work with it…
Data scientist are not magicians and they can’t make results happen when there’s nothing to produce results from. This creates tension between the client and the data scientist which I think deep down is rooted in the fact that one is not aware of how data science works and what is needed to do this job. Both sides need to ensure there’s an open dialog in this process.
(4a) Most of the time a regression will do just fine. Chill.
A lot of times we as data scientists OBSESS with the cool new hot model. And we spend hours coding our neural networks or our tensor flows, but do we need to do that all the time? No. Absolutely not. I have been guilty of this myself, many times #noshame.
The reality is that a lot of the problems we are trying to solve can be perfectly solved with a regression, yet we spend an insane amount of time applying complicated models to simple questions.
There’s a time and place for neural networks, finding out the word frequency of your text data is not. Choose your battles.
(4b) Thinking anyone can do data science.
No, no, no. Just because someone learned how to write 5 lines of code does not mean they can do data science. You need someone who will not only understand the code, but also the math behind the code, the output of the model and understand data itself.
(5) Leave behind your “academia” mentality. That train is gone. Bye Felicia.
This one was a big one for me. When I first transitioned, I really thought it would be just like academia. Oh, boy I was wrong. When you’re in academia, everyone is like you, everyone knows what you’re talking out. The corporate world is another beast.
You need to learn to pace yourself down and realize that a lot of your clients may not understand data. It is a crucial skill for you to learn how to break down the complexity of your models and explain them to your client. Also, get really damn good at power point. You’ll be using that shit on the daily. Also, stop putting complicated equations and plots on your slides, nobody wants to see that. I promise you.
(6) The Unicorn
With all this said, sometimes you find a job that can properly balance all this stuff and that’s great. In fact, I’ve had about the same amount of bad and good experiences so all in all it’s a great field to be in, kiddo!
I hope this post helps both employers and data scientists alike in understanding what goes in each others’ minds and such.