Institute Logo

Face Identification

PRML project 2025 - IIT Jodhpur


Team Members

Project Details

Project Report
Github Repository
Short talk on youtube

Face Identification is a computer technology being used in a variety of applications that identifies human faces in digital images. Face detection also refers to the psychological process by which humans locate and attend to faces in a visual scene. The project aims to detect faces in images using Basic classifier techniques.
In this project we will use some basic classifier techniques to identify faces in images. Our models are trained on LFW Dataset. We will use the following algorithms to detect faces in images

Web demo Github Repo


We used our best model (ANN) in backend and used flask to connect backend and frontend. In web demo you can upload an image and see the result that will be classfication among 5 persons.

result1 result1

Content

Data pre-processing


About processed dataset

Data preprocessing is a crucial step in machine learning, especially in face identification tasks. The raw images in the LFW dataset come in various sizes, color schemes, and lighting conditions. To prepare these for model training, we applied several preprocessing techniques such as resizing, filtering, and normalization.


How we processed the data

This is how our final dataset was made and was then used to train on various algorithms mentioned below.


Why we processed the data

Preprocessing was essential for:

We observed that models trained on preprocessed images performed significantly better than those trained on raw RGB images, particularly in terms of training time and classification accuracy. We also noticed that some images classes had very few images, which made it difficult for the model to learn from them. To address this, we filtered the dataset to include only those classes with a minimum of 80 images. This ensured that our model had enough data to learn from and improved its overall performance.

KNN algorithm Code


Introduction

The LFW dataset has varied number of images per person, some have very few images, while others have a large number of images.
Since, KNN does not require any training, instead, it calculates the distances of the test data point from all the training data points each time, hence, it gives nearly accurate predictions most of the times.


How we applied it

We assumed each pixel of image as a feature and trained our model on the LFW dataset. We used the decision tree algorithm to classify the images into classes i.e. each class for different person.


Results

best result is KNN on CNN + LBP features with lda keeping k=5:

Naive Bayes algorithmCode

Introduction

Naive Bayes is a probabilistic classifier based on Bayes' Theorem, since features are conditionally independent given the class label. It's simple, fast, and works well with high-dimensional data


How we applied it

To use Naive Bayes on dataset with LBP and CNN features we train the Naive Bayes classifier on them to predict identities. Each feature vector (LBP/CNN) is treated as input, and the model learns the probability distribution over identities.


Results

best result is on CNN features :

Decision Tree algorithm Code


Introduction

Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too.


How we applied it

To use a Decision Tree on the LFW dataset with LBP and CNN features, begin by preprocessing the images to extract these features — LBP captures local texture patterns, while CNN features encode deeper, high-level representations.rain the Decision Tree classifier on the training set by recursively partitioning the feature space to maximize class purity.The model learns patterns that differentiate between individual identities


Results

best result is on CNN features with max depth of 6 :

Random Forest Code


Introduction

Random Forests, an ensemble learning method, combine multiple decision trees to enhance accuracy and robustness in face identification tasks. In face recognition, they handle variations in pose, illumination, and expression by analyzing features like LBP textures or CNN-derived semantic patterns.


How we applied it

irst extract these features from the images to form structured input vectors. Random Forest, an ensemble of Decision Trees, is then trained on these features — each tree learns to classify identities using random subsets of data and features, improving generalization and reducing overfitting.The final prediction is made through majority voting among the trees


Results

best result is on CNN features on applying LDA :

Clustering Code


Introduction

Clustering in face identification groups unlabeled facial images into distinct clusters, each representing a unique individual, using unsupervised learning


How we applied it

after extracting features and applying LDA,Since clustering is unsupervised, no identity labels are used during training. Apply a clustering algorithm like K-Means to group similar faces based on feature similarity


Results

best result is on CNN features with LDA:

ANN Code


Introduction

{write in brief about your algo and why this is helpful}
e.g. Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too.


How we applied it

extract and normalize these features to form input vectors.then a feedforward neural network with input size matching the feature dimension, hidden layers to learn complex patterns, and an output layer with neurons equal to the number of unique identities. Training the ANN using a labeled dataset, optimizing a loss function like cross-entropy via backpropagation.


Results

best result is on CNN features with LDA:

Results


result1

Google Cloud

For hosting the web demo of my face identification project, I utilized Google Cloud Platform (GCP) to ensure reliable performance and scalability. By deploying the backend services and machine learning models on a virtual machine using Compute Engine, I was able to handle real-time face recognition requests efficiently. GCP's secure infrastructure and global network helped maintain low-latency responses, making the user experience smooth and responsive.