Printed vs Handwritten Detection – Classify Text Type in Documents
Team Members
Suraj Gupta
Prathmesh gaikwas
Yash Machhi
Use Case Importance
Detecting whether text is printed or handwritten helps automate document processing in real-world applications like exam paper evaluation, form verification, and digitization of records. This system reduces manual effort and improves accuracy when handling mixed-type documents.
Data Collection and Annotation
Data Collection:
Images were collected from multiple sources including handwritten notes, printed documents, and scanned papers. Data included variations in handwriting styles, fonts, lighting conditions, and backgrounds to ensure robustness.
Annotated Classes:
Two classes:
Printed
Handwritten
Annotation Tool:
Roboflow
Total Images:
Dataset prepared with training, validation, and testing splits for effective model learning.
Model Training and Validation
Model & Version:
YOLOv8 Classification model trained using Google Colab
Training Details:
Epochs: 100
Batch Size: 16
Image Size: 224×224
Optimizer: Default (SGD/Adam depending on setup)
Augmentations:
Rotation
Flipping
Brightness & Contrast Adjustment
Monitored Metrics:
Accuracy
Loss
Precision
Performance Improvement:
Initial dataset size was limited, leading to lower accuracy. After applying augmentation and increasing dataset diversity, the model performance improved significantly.
Model Deployment and Demo Video
Performance:
The model performs efficiently in real-time classification with high accuracy on test data.
Deployment:
Model deployed using YOLOvX application for real-time testing.
Demo Video:
Conclusion
The YOLOv8 classification model successfully distinguishes between printed and handwritten text with strong accuracy. This project highlights the importance of dataset quality, augmentation, and lightweight models for real-time deployment. It can be further extended for document digitization and intelligent OCR systems.