PRML project 2025 - IIT Jodhpur
Face Identification is a computer technology being used in a variety of applications that identifies human faces in digital images. Face detection also refers to the psychological process by which humans locate and attend to faces in a visual scene. The project aims to detect faces in images using Basic classifier techniques.
In this project we will use some basic classifier techniques to identify faces in images. Our models are trained on LFW Dataset. We will use the following algorithms to detect faces in images
We used our best model (ANN) in backend and used flask to connect backend and frontend. In web demo you can upload an image and see the result that will be classfication among 5 persons.
Data preprocessing is a crucial step in machine learning, especially in face identification tasks. The raw images in the LFW dataset come in various sizes, color schemes, and lighting conditions. To prepare these for model training, we applied several preprocessing techniques such as resizing, filtering, and normalization.
Preprocessing was essential for:
The LFW dataset has varied number of images per person, some have very few images, while others have a large number of images.
Since, KNN does not require any training, instead, it calculates the distances of the test data point from all the training data points each time, hence, it gives nearly accurate predictions most of the times.
We assumed each pixel of image as a feature and trained our model on the LFW dataset. We used the decision tree algorithm to classify the images into classes i.e. each class for different person.
best result is KNN on CNN + LBP features with lda keeping k=5:
Naive Bayes is a probabilistic classifier based on Bayes' Theorem, since features are conditionally independent given the class label. It's simple, fast, and works well with high-dimensional data
best result is on CNN features :
Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too.
To use a Decision Tree on the LFW dataset with LBP and CNN features, begin by preprocessing the images to extract these features — LBP captures local texture patterns, while CNN features encode deeper, high-level representations.rain the Decision Tree classifier on the training set by recursively partitioning the feature space to maximize class purity.The model learns patterns that differentiate between individual identities
best result is on CNN features with max depth of 6 :
Random Forests, an ensemble learning method, combine multiple decision trees to enhance accuracy and robustness in face identification tasks. In face recognition, they handle variations in pose, illumination, and expression by analyzing features like LBP textures or CNN-derived semantic patterns.
irst extract these features from the images to form structured input vectors. Random Forest, an ensemble of Decision Trees, is then trained on these features — each tree learns to classify identities using random subsets of data and features, improving generalization and reducing overfitting.The final prediction is made through majority voting among the trees
best result is on CNN features on applying LDA :
Clustering in face identification groups unlabeled facial images into distinct clusters, each representing a unique individual, using unsupervised learning
after extracting features and applying LDA,Since clustering is unsupervised, no identity labels are used during training. Apply a clustering algorithm like K-Means to group similar faces based on feature similarity
best result is on CNN features with LDA:
{write in brief about your algo and why this is helpful}
e.g. Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too.
extract and normalize these features to form input vectors.then a feedforward neural network with input size matching the feature dimension, hidden layers to learn complex patterns, and an output layer with neurons equal to the number of unique identities. Training the ANN using a labeled dataset, optimizing a loss function like cross-entropy via backpropagation.
best result is on CNN features with LDA: