Face Identification

PRML project 2025 - IIT Jodhpur

Team Members

Project Details

Project Report
Github Repository
Short talk on youtube

Face Identification is a computer technology being used in a variety of applications that identifies human faces in digital images. Face detection also refers to the psychological process by which humans locate and attend to faces in a visual scene. The project aims to detect faces in images using Basic classifier techniques.
In this project we will use some basic classifier techniques to identify faces in images. Our models are trained on LFW Dataset. We will use the following algorithms to detect faces in images

Web demo Github Repo

We used our best model (ANN) in backend and used flask to connect backend and frontend. In web demo you can upload an image and see the result that will be classfication among 5 persons.

Data pre-processing

About processed dataset

Data preprocessing is a crucial step in machine learning, especially in face identification tasks. The raw images in the LFW dataset come in various sizes, color schemes, and lighting conditions. To prepare these for model training, we applied several preprocessing techniques such as resizing, filtering, and normalization.

How we processed the data

filtered data set to include images with count more than 10
using pretrained Resnet to extract CNN features and saving into csv
from skimage library we imported functions to extract HoG features and store into csv
used openCV to extract LBP features and store into csv
then we applied some more filtering and PCA/LDA for dimensionality reduction

This is how our final dataset was made and was then used to train on various algorithms mentioned below.

Why we processed the data

Preprocessing was essential for:

Reducing dimensionality and complexity of the input images.
Improving model generalization by removing irrelevant variation in lighting or size.
Achieving faster training and testing due to smaller, cleaner inputs.

We observed that models trained on preprocessed images performed significantly better than those trained on raw RGB images, particularly in terms of training time and classification accuracy. We also noticed that some images classes had very few images, which made it difficult for the model to learn from them. To address this, we filtered the dataset to include only those classes with a minimum of 80 images. This ensured that our model had enough data to learn from and improved its overall performance.

KNN algorithm Code

Introduction

The LFW dataset has varied number of images per person, some have very few images, while others have a large number of images.
Since, KNN does not require any training, instead, it calculates the distances of the test data point from all the training data points each time, hence, it gives nearly accurate predictions most of the times.

How we applied it

We assumed each pixel of image as a feature and trained our model on the LFW dataset. We used the decision tree algorithm to classify the images into classes i.e. each class for different person.

Results

best result is KNN on CNN + LBP features with lda keeping k=5:

Test Accuracy : 79.04%
Train Accuracy :98.00%

Naive Bayes algorithmCode

Introduction

Naive Bayes is a probabilistic classifier based on Bayes' Theorem, since features are conditionally independent given the class label. It's simple, fast, and works well with high-dimensional data

How we applied it

To use Naive Bayes on dataset with LBP and CNN features we train the Naive Bayes classifier on them to predict identities. Each feature vector (LBP/CNN) is treated as input, and the model learns the probability distribution over identities.

Results

best result is on CNN features :

Test Accuracy : 0.39
Train Accuracy :0.57

Decision Tree algorithm Code

Introduction

Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too.

How we applied it

To use a Decision Tree on the LFW dataset with LBP and CNN features, begin by preprocessing the images to extract these features — LBP captures local texture patterns, while CNN features encode deeper, high-level representations.rain the Decision Tree classifier on the training set by recursively partitioning the feature space to maximize class purity.The model learns patterns that differentiate between individual identities

Results

best result is on CNN features with max depth of 6 :

Test Accuracy : 0.75
Train Accuracy :0.99

Random Forest Code

Introduction

Random Forests, an ensemble learning method, combine multiple decision trees to enhance accuracy and robustness in face identification tasks. In face recognition, they handle variations in pose, illumination, and expression by analyzing features like LBP textures or CNN-derived semantic patterns.

How we applied it

irst extract these features from the images to form structured input vectors. Random Forest, an ensemble of Decision Trees, is then trained on these features — each tree learns to classify identities using random subsets of data and features, improving generalization and reducing overfitting.The final prediction is made through majority voting among the trees

Results

best result is on CNN features on applying LDA :

Test Accuracy :0.802
Train Accuracy :1.00

Clustering Code

Introduction

Clustering in face identification groups unlabeled facial images into distinct clusters, each representing a unique individual, using unsupervised learning

How we applied it

after extracting features and applying LDA,Since clustering is unsupervised, no identity labels are used during training. Apply a clustering algorithm like K-Means to group similar faces based on feature similarity

Results

best result is on CNN features with LDA:

Test Accuracy : 77%
Train Accuracy :100%

ANN Code

Introduction

{write in brief about your algo and why this is helpful}
e.g. Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too.

How we applied it

extract and normalize these features to form input vectors.then a feedforward neural network with input size matching the feature dimension, hidden layers to learn complex patterns, and an output layer with neurons equal to the number of unique identities. Training the ANN using a labeled dataset, optimizing a loss function like cross-entropy via backpropagation.

Results

best result is on CNN features with LDA:

Test Accuracy : 79.33%
Train Accuracy :100%

Results

Google Cloud

For hosting the web demo of my face identification project, I utilized Google Cloud Platform (GCP) to ensure reliable performance and scalability. By deploying the backend services and machine learning models on a virtual machine using Compute Engine, I was able to handle real-time face recognition requests efficiently. GCP's secure infrastructure and global network helped maintain low-latency responses, making the user experience smooth and responsive.