How Do I Start My Data Science Career?

The Scope of data science as a career is increasing day by day. People are loving this career and its tremendous opportunities. Even non-tech people are making data science their primary career. This article focuses more on how to learn than on where to learn.

Lesson 1: Break it down

When I first started learning data science I was overwhelmed with the size of the field. I had to learn programming languages and concepts from statistics, linear algebra, calculus, etc. When I was confronted with this many options, I didn’t know where to start.

Fortunately for me, I had coursework to guide my studies. The degrees that I did broke down many of the concepts into smaller chunks (classes) so they were digestible. While this worked for me, I find that schools have a one-size-fits-all approach to this. They also include many extraneous classes that you don’t actually need. If I could go back, I could definitely break my data science learning journey into chunks better suited for me.

Before diving into data science, it makes sense to understand the components that are used in the field. Rather than breaking things into “courses”, you can make data science into even smaller and more digestible chunks.

I generally break data science into programming and math.

Programming — familiarity with Python and/or R

Variables
Loops
Functions
Objects
Packages (pandas, NumPy, matplotlib, sklearn, TensorFlow, PyTorch, etc.)

Math & Statistics

Probability theory
Regression (linear, multiple linear, ridge, lasso, random forest, SVM, etc.)
Classification (naive Bayes, knn, decision tree, random forest, SVM, etc.)
Clustering (k means, hierarchical)

By breaking data science down into its components, you transform it from being an abstract concept into concrete steps.

Lesson 2: Start somewhere

When I was starting out, I was obsessed with learning things in the “correct” sequence. After entering the field, I found that many data scientists learned their skills in drastically different orders. I met Ph.D.’s that had studied math first, and only learned the programming concepts after taking a Bootcamp. I also met software engineers that were incredible programmers and learned math later through self-study and application.

I now realize that it is most important to start somewhere, preferably with a topic you are interested in. I found that learning is additive. If you learn one thing, you are not forgoing learning another concept.

If I had to go back, I would start with the concepts that were most interesting for me at the time. Once you learn a single concept, you can build on that knowledge to understand others. For example, if you learn a simple linear regression, multiple linear regression is a fairly easy step.

Still, I probably wouldn’t jump right in and start with deep learning. It helps to start small and simple and build on that foundation.

Lesson 3: Build Minimum Viable Knowledge (MVK)

Over time, I’ve had a change of opinion about how much foundational knowledge you need. After experiencing many different types of learning myself, I believe that learning by doing real-world projects is the most effective way to grasp a field. I think that you should understand just enough of these concepts to be able to start exploring your own projects.

This is where minimum viable knowledge comes into play. You should start by learning just enough to be able to learn through doing. This stage is fairly hard to identify. Generally, you will feel like you aren’t ready when you first get here. This is a good thing though, it means that you are pushing yourself out of your comfort zone.

You can reach this stage fairly easily. I think you can get to this level of knowledge with every introductory online course.

To get to this step, all you really need to understand is the basics of python or R and have a familiarity with the packages used. You can start learning math later by applying some of the algorithms to real-world data.

Lesson 4: Get your hands dirty

With your basic knowledge, I recommend getting into projects as quickly as possible. Again, this sounds scary, but a project is all about how you define it.

At the early stages, a project could be something as simple as experimenting with a for a loop. As you progress, you can graduate to projects using data on Kaggle, and eventually using data that you have collected.

I am a HUGE believer that the best way to learn data science is to do data science. I think that the theory is VERY important, but no one says that you have to understand it all before you start applying it. The theory is something you can go back to after you have a functional understanding of the algorithms. For me, real-world examples were always what made things click. If you start with the real-world examples through projects, I think things have a far higher chance of things “clicking” when you start learning the theory.

Projects also have the power to make data science smaller. One of the biggest challenges I see for new learners is that the field of data science can be overwhelming. Confining the things you are learning to the size of a small project allows you to break things down even further than you did in Lesson 1.

Projects offer one additional benefit. They give you immediate feedback on where you need to improve. If you are working on a project and you run into a roadblock about what package, algorithm, or visual to use, you now know that you should probably study that area of the field further.

Lesson 5: Learn from other people’s code

While doing your own projects is great, sometimes you don’t know what you don’t know. I highly recommend going through the code of more experienced data scientists to get ideas about what to learn next and to better understand logic or syntax.

I recommend making a list of the packages, algorithms, and visuals that you see being used. You should go to the documentation for the packages and expand your knowledge there. They almost always have examples in the docs for how they should be used. Again, this list can be used to help you think of new project ideas and experiments.

Lesson 6: Build algorithms from scratch

This is a rite of passage for most data scientists. After you have applied an algorithm and understand how it works in practice, I recommend trying to code it from scratch. This helps you to better understand the underlying math and other mechanisms that make it work. When doing this, you will undoubtedly have to learn the theory behind it as well.

I personally think that learning in this direction is far more intuitive than trying to master the theory and then apply it. This is the approach that fastai has taken with their free mooc. I highly recommend it if you are interested in deep learning.

For this, I generally recommend starting with linear regression. This will help you to better understand gradient descent, which is an extremely important concept to build on.

As you advance your data science career further, I think theory becomes increasingly important. You bring value by matching the correct algorithm to the problem. The theory associated with the algorithm greatly facilitates this process.

Lesson 7: Never stop learning

The beauty of the data science journey is that it never ends. You will need to keep learning to stay on top of new packages and advancements in the field. I recommend doing this through (you guessed it) more projects. I also recommend continuing with the code review and reading new research that is published.

This is more of a mindset recommendation than anything practical. If you think that there is a pinnacle, you are in for a surprise!

Insideaiml is one of the best platforms where you can learn Python, Data Science, Machine Learning, Artificial Intelligence & showcase your knowledge to the outside world.

Artificial Intelligence 2021

Search This Blog

Tuples in Python