Machine Learning Classification Basics

ML is divided into four parts. They are :

Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Reinforcement Learning

Supervised Learning is subdivided internally.

Regression
Classification

Unsupervised is also subdivided internally.

Clustering
Dimensionality Reduction
Anomaly Detection
Association Rule Learning

Supervised Learning

If data has both input and output and training a model is a way to find out the relation between them and predict the unseen data, it is called Supervised Learning.

Example

IQ CGPA PLACEMENT (Y/N)

80 7.5 N

90 8 Y

111 7.9 Y

70 6 N

we have the above data of 5000 rows. we can easily say that "IQ" and "CGPA" are inputs and "PLACEMENT" is the output in the above table. In the above data, we have both input as well as output. The work of any ML algorithm is to draw a mathematical relationship between inputs and output. Now the ML algorithm can predict unseen data {100,9} with the help of the drawn relation. This is called Supervised Machine Learning.

IQ	CGPA	PLACEMENT (Y/N)
80	7.5	N
90	8	Y
111	7.9	Y
70	6	N

Supervised ML has two parts Regression and Classification.

Before we go any further we need to know the two data types. The first is Numerical (ex: 1, 5 ). The second is Categorical. Categorical data is a type of data, that is used to group information of similar characteristics.

Regression

If you are applying a Supervised ML Algorithm to a dataset. If that dataset has a numerical output then that Supervised ML Algorithm applied on the dataset is called Regression.

Example:

IQ CGPA PACKAGE(LPA)

80 7.5 8

90 8 10

111 7.9 15

70 6 7.5

Here in the above dataset, the inputs are IQ and CGPA, and the output is PACKAGE. So, the Supervised ML Algorithm applied to the above dataset is called Regression.

IQ	CGPA	PACKAGE(LPA)
80	7.5	8
90	8	10
111	7.9	15
70	6	7.5

Classification

On the contrary, if the output is Categorical then the Supervised ML Algorithm used is called Classification.

Example

IQ CGPA PLACEMENT (Y/N)

80 7.5 N

90 8 Y

111 7.9 Y

70 6 N

Here in the above dataset, the inputs are IQ and CGPA, and the output is PLACEMENT. So, the Supervised ML Algorithm applied to the above dataset is called Classification.

IQ	CGPA	PLACEMENT (Y/N)
80	7.5	N
90	8	Y
111	7.9	Y
70	6	N

Unsupervised Learning

Differing from supervised, the dataset for unsupervised has only inputs and no output. The ML algorithm used for this dataset cannot predict tasks. Instead, it can do the following:

Clustering
Dimensionality Reduction
Anomaly Detection
Association Rule Learning

Clustering

Grouping data points with similar characteristics into separate groups is called Clustering. Let's understand it with an example.

Example:

In the figure above, if we look at the plot on the left, we can see that three groups can be formed. The clustering algorithm identifies the number of groups and creates a clustering region for each group.

Dimensionality Reduction

Dimensionality Reduction is a powerful technique. When you apply a Supervised ML Algorithm to a dataset, it may have thousands of input columns. Having so many input columns can slow down the ML algorithm because it has to process a large amount of data. After a certain point, adding more input columns doesn't improve the prediction. Dimensionality Reduction decreases the number of input columns. It is also useful when your data has many dimensions that can't be visualized by plotting. Dimensionality Reduction can reduce these to 3 dimensions, making them easier to visualize.

9. Dimensionality Reduction — Single-cell best practices

Anomaly Detection

It is a technique used to identify certain pieces of data that differ from the majority of data and don't fit the normal behavioral pattern. It is also called Outlier Detection.

Anomaly detection for cyber security via machine learning

Association Rule Learning

It is a detective in the data world. It finds surprising connections between things people buy, websites they visit, or the articles they read.

What is Association Rule Learning? Machine Learning Interview Questions

Semi-Supervised Learning

Now, from the name itself, you must have understood it is partially supervised and partially unsupervised. Creating labels for a dataset is called Labeling. It is a costly task because it requires manual work. The core idea behind semi-supervised learning is to label a small amount of data and then let the system automatically label the rest.

A Gentle Introduction to Semi Supervised Learning | by Gayatri Sharma | Medium

Reinforcement Learning

Think of it as training a dog with rewards. An AI learns by trying actions and getting "treats" for good choices.

Conclusion

In conclusion, machine learning is divided into various methodologies tailored to different types of data and objectives. Supervised learning, with its regression and classification techniques, is ideal for datasets with clear input-output relationships. Unsupervised learning excels in discovering hidden patterns through clustering, dimensionality reduction, anomaly detection, and association rule learning. Semi-supervised learning strikes a balance by leveraging both labeled and unlabeled data, while reinforcement learning mimics the process of learning through rewards and penalties. Understanding these classifications and their applications is crucial for effectively leveraging machine learning to solve complex problems.

Machine Learning Classifications: An Overview

A Comprehensive Guide to Different Types of Machine Learning