Beginner’s Guide to Machine Learning with Python
- 2019-04-10 01:19 AM
Beginner’s Guide to Machine Learning with Python
Machine Learning, a prominent topic in Artificial Intelligence domain, has been in the spotlight for quite some time now. This area may offer an attractive opportunity, and starting a career in it is not as difficult as it may seem at first glance. Even if you have zero-experience in math or programming, it is not a problem. The most important element of your success is purely your own interest and motivation to learn all those things.
If you are a newcomer, you do not know where to start studying and why you need Machine Learning and why it is gaining more and more popularity lately, you got into the right place! I’ve gathered all the needed information and useful resources to help you gain new knowledge and accomplish your first projects.
Why Starting With Python?
If your aim is growing into a successful coder, you need to know a lot of things. But, for Machine Learning & Data Science, it is pretty enough to master at least one coding language and use it confidently. So, calm down, you don’t have to be a programming genius.
For successful Machine Learning journey, it’s necessary to choose the appropriate coding language right from the beginning, as your choice will determine your future. On this step, you must think strategically and arranged correctly the priorities and don’t spend time on unnecessary things.
My opinion — Python is a perfect choice for beginner to make your focus on in order to jump into the field of machine learning and data science.It is a minimalistic and intuitive language with a full-featured library line (also called frameworks) which significantly reduces the time required to get your first results.
Step 0. Brief Overview of ML Process You Need to Know
Machine learning is learning based on experience. As an example, it is like a person who learns to play chess through observation as others play. In this way, computers can be programmed through the provision of information which they are trained, acquiring the ability to identify elements or their characteristics with high probability.
First of all, you need to know that there are various stages of machine learning:
- data collection
- data sorting
- data analysis
- algorithm development
- checking algorithm generated
- the use of an algorithm to further conclusions
To look for patterns, various algorithms are used, which are divided into two groups:
- Unsupervised learning
- Supervised learning
With unsupervised learning, your machine receives only a set of input data. Thereafter, the machine is up to determine the relationship between the entered data and any other hypothetical data. Unlike supervised learning, where the machine is provided with some verification data for learning, independent Unsupervised learning implies that the computer itself will find patterns and relationships between different data sets. Unsupervised learning can be further divided into clustering and association.
Supervised learning implies the computer ability to recognize elements based on the provided samples. The computer studies it and develops the ability to recognize new data based on this data. For example, you can train your computer to filter spam messages based on previously received information.
Some Supervised learning algorithms include:
- Decision trees
- Support-vector machine
- Naive Bayes classifier
- k-nearest neighbors
- linear regression
Step 1. Brush up Your Math Skills Needed for Python Mathematical Libraries
A person working in the field of AI and ML who doesn’t know math is like a politician who doesn’t know how to persuade. Both have an inescapable area to work upon!
So yes, you can’t deal with ML and Data Science projects without leastwise minor math knowledge basis. However, you don’t need to have a degree in Mathematics to succeed. In my personal experience, devoting at-least 30–45 minutes every day will bear much fruit and you’ll understand and learn advanced Python topics for Maths and Statistics faster.
You need to read or refresh the underlying theory. No need to read a whole tutorial, just focus on key concepts.
Here are 3 steps to learn the mathematics needed for analysis and machine learning:
1 — Linear algebra for data analysis: Scalars, Vectors, Matrices, and Tensors
For example, for the Principal Component Method, you need to know the eigenvectors, and the regression requires matrix multiplication. In addition, machine learning often works with high-dimensional data (data with many variables). This data type is best represented by matrices.
2 — Mathematical Analysis: Derivatives and Gradients
Mathematical analysis underlies many machine learning algorithms. Derivatives and gradients will be needed for optimization problems. For example, one of the most common optimization methods is gradient descent.
For quick learning of Linear Algebra and Math Analysis, I would like to recommend these courses:
Khan Academy provides short practical lessons on linear algebra and math analysis. They cover the most important topics.
MIT OpenCourseWare offers great courses for learning math for ML. All video lectures and study materials are included.
3 — Gradient descent: building a simple Neural Network from scratch
One of the best ways to learn mathematics in the field of analysis and machine learning is to build a simple neural network from scratch. You will use linear algebra to represent a network and mathematical analysis to optimize it. In particular, you will create a gradient descent from scratch. Do not worry too much about the nuances of neural networks. This is fine if you just follow the instructions and write the code.
Step 2. Learn the Basics of Python Syntax
Great news: you do not need a full learning course, as Python and data analysis are not synonymous.
Before starting diving into the syntax, I want to share one insightful advice, which may minimize your possible failures.
Learn to swim by reading books on swimming techniques is impossible, but reading them in parallel with training in the pool results in gaining skills more effectively.
A similar action occurs with the study of programming. It is not worthwhile to focus solely on the syntax. Just like that, you risk losing your interest.
You do not need to memorize everything. Make small steps and do not be afraid to combine theoretical knowledge with practice. Focus on an intuitive understanding, for example, which function is appropriate in a particular case and how conditional operators work. You will gradually memorize the syntax by reading the documentation and in the process of writing code. Soon you no longer have to google such things.
If you don’t have any programming understanding, I recommend reading Automate the Boring Stuff With Python. The book offers to explain practical programming for total beginners and teach from scratch. Read Chapter 6, “String Manipulation,” and complete the practical tasks for this lesson. That will be enough.
Here are some other great resources to explore:
Codecademy — teaches good general syntax
Learn Python the Hard Way — a brilliant manual-like book that explains both basics and more complex applications.
Dataquest— this resource teaches syntax while also teaching data science
**The Python Tutorial **— official documentation
And remember, the sooner you start working on real projects, the sooner you will learn it. Anyway, you can always go back to the syntax if you need it.
Step 3. Discover the Main Data Analysis Libraries
The further stage is to revise and mug up the part of Python that is applicable to data science. And yes, it is time to learn libraries or frameworks. As pointed out before, Python possesses a vast number of libraries. Libraries are purely a collection of ready-made functions and objects that you can import into your script to invest less time.
How to use libraries? Here are my recommendations:
- Open Jupyter Notebook (see below).
- Go over the library documentation in about half an hour.
- Import the library into your Jupyter Notebook.
- Follow the step-by-step guide to see the library in action.
- Examine the documentation to see what else it is capable of.
I do not recommend immediately diving into learning libraries, because you will probably forget most of what you learned by the time you start using them in projects. Instead, try to find out what each library is capable of.
Jupyter Notebook is a lightweight IDE that is a favorite among analysts. In most cases, the installation package for Python already includes Jupyter Notebook. You can open a new project through Anaconda Navigator, which is included in the Anaconda package. Watch this introductory video.
Python libraries you will need:
NumPy is shortened from Numerical Python, it is the most universal and versatile library both for pros and beginners. Using this tool you are up to operate with multi-dimensional arrays and matrices with ease and comfort. Such functions like linear algebra operations and numerical conversions are also available.
Pandas is a well-known and high-performance tool for presenting data frames. Using it you can load data from almost any source, calculate various functions and create new parameters, build queries to data using aggregate functions akin to SQL. What is more, there are various matrix transformation functions, a sliding window method and other methods for obtaining information from data. So it’s totally an indispensable thing in the arsenal of a good specialist.
Matplotlib is a flexible library for creating graphs and visualization. It is powerful but somewhat heavy-weight. At this point, you can skip Matplotlib and use Seaborn to get started (see Seaborn below).
I can say it’s the most well-designed ML package I’ve observed so far. It implements a wide-range of machine-learning algorithms and makes it comfortable to plug them into actual applications. You can use a whole slew of functions here like regression, clustering, model selection, preprocessing, classification and more. So, it’s totally worth learning and using. The great advantage here is the high speed of work. So it’s not surprising why such leading platforms like Spotify, Booking.com, J.P.Morgan are using scikit-learn.
Step 4. Develop Structured Projects
Once you master the basic syntax and explore the basics of libraries, you can already begin to make projects yourself. Thanks to the projects, you will be able to learn about new things as well as create a portfolio for further job search.
There are enough resources that offer topics for structured projects.
**Dataquest **— Interactively teaches Python and data science. You are analyzing a series of interesting data sets, starting with the documents of the Central Intelligence Agency and ending with the statistics of the National Basketball Association’s games. You will develop tactical algorithms that include neural networks and decision trees.
Python for Data Analysis — A book written by the author of many papers on the analysis of data on Python.
Scikit — documentation — The main computer training library on Python.
CS109— Courses from Harvard University for Data Science.
Step 5. Work on Your Own Projects
You can find a lot of new things, but it is important to find those projects that will spark a light in you. However, right before this happy moment of finding your dream job, you should learn how to handle errors in your programs excellently. Among the most popular useful resources for this purpose, one can distinguish the following:
StackOverflow — multi-functional site with a bunch of questions and answers where people discuss all possible problems. Plus, it’s the most popular place, so you can ask about your errors and get the answer from a huge audience
Python Documentation — one more good place to search for reference material
It goes without saying, you also should not neglect any opportunity or collaboration you are requested. Participate in all possible events related to Python and find people who work on interesting projects. Explore new projects that have been made by other people, by the way, Github is an excellent place for this aim. Learn about new and stay tuned in a theme — all this will definitely contribute for level up your game!
Final Word and a Bit of Motivation
You may probably ask ‘why should I plunge into machine learning realm; probably, there are already lots of other good specialists.’
Know what? I had also been fallen into this trap and now can boldly say — such thinking will not bring you anything good. It’s an immense barrier to your success.
According to Moore’s Law, the number of transistors on an integrated circuit doubles every 24 months. This means that every year the performance of our computers is growing, which means that the previously inaccessible boundaries of knowledge are again “shifted to the right” — there is room for studying big data and machine learning algorithms!
Who knows what awaits us in the future. Perhaps these numbers will increase even more and machine learning will become more important? And most likely, yes!
☞ Machine Learning A-Z™: Hands-On Python & R In Data Science
☞ Python for Data Science and Machine Learning Bootcamp
☞ Machine Learning, Data Science and Deep Learning with Python
☞ A Complete Machine Learning Project Walk-Through in Python
☞ A Feature Selection Tool for Machine Learning in Python
☞ Machine Learning: how to go from Zero to Hero
☞ Automated Machine Learning on the Cloud in Python
☞ Machine Learning Tutorial - Complete Machine Learning Course for Beginners Part 1/3
Originally published by Oleksii Kharkovyna at https://towardsdatascience.com