Skip to the content.

Manjit Singh

Data Science Portfolio


Education

Executive Post Graduate Program in Data Science & AI
IIIT Bangalore
Oct ‘23 – Jan ‘25

Bachelor of Commerce (B.Com)
Gauhati University
Jul ‘20 – Jun ‘23


Professional Experience

Associate - Data Analyst
Samhita | Remote | Nov 2024 – Present

At Samhita, I support data-driven decision-making for financial inclusion projects by managing datasets, uncovering insights, and automating processes across multiple programs. I work closely with program, tech, and MEL teams to ensure data flows efficiently from collection to actionable insight.

• Key Responsibilities:

🧹 Data Management & Reporting:

📊 Dashboarding & Visualization:

🔍 Data Analysis & Insight Generation:

⚙️ Process Automation & GenAI Adoption:

🤝 Cross-Team Collaboration:


Internships

Business Analyst Intern
Quest Global Technologies Ltd | Remote | Nov 2023 – May 2024


Key Projects

🛍️ Sentiment-Based Product Recommendation System

Tech Stack: Python, Flask, Scikit-learn, Pandas, NLTK, HTML/CSS

Built an intelligent recommendation engine that combines collaborative filtering with sentiment analysis to generate more accurate and personalized product suggestions.

Impact: Improved recommendation precision by aligning suggestions with user sentiment, resulting in a more engaging and tailored user experience.

View on GitHub

📄 Helmate AI: Retrieval-Augmented Generation (RAG) System

Tech Stack: Python, LangChain, GROQ AI, SBERT, ChromaDB

Developed a cost-efficient, open-source RAG pipeline to accurately respond to user queries based on insurance policy documents, enhancing the accessibility and understanding of complex policy information.

Impact: Delivered a scalable and interpretable RAG system with improved query response accuracy—built entirely on open-source tools for maximum accessibility and zero API cost.

View on GitHub

🧾 Automatic Ticket Classification System

Tech Stack: Python, Scikit-learn, NLTK, Pandas

Built an intelligent automated ticket classification system to streamline customer support by accurately routing tickets to the relevant departments.

Impact: Automated classification significantly reduced manual workload, improved response speed, and laid a foundation for intelligent support systems.

View on GitHub

🧩 Customer Segmentation using Clustering

Tech Stack: Python, Scikit-learn, Pandas, Seaborn, Matplotlib

Implemented a robust unsupervised learning pipeline to segment customers based on purchasing behavior for targeted marketing and business strategy enhancement.

Impact: Enabled data-driven customer targeting and improved marketing ROI through insight-backed segmentation.

View on GitHub

🎬 RSVP Movies Analysis – SQL-Based Data Exploration

Tech Stack: MySQL, SQL Joins, Aggregations, Subqueries

Performed a comprehensive SQL analysis on the RSVP Movies dataset to derive actionable insights into movie performance, viewer behavior, and content trends.

Impact: Enabled data-driven decisions for movie curation, marketing strategies, and platform-specific content optimization through insightful SQL reporting.

View on GitHub

📉 Telecom Churn Prediction

Tech Stack: Python, Scikit-learn, Pandas, Seaborn, Matplotlib

Developed a predictive machine learning model to identify telecom customers likely to churn, enabling proactive retention strategies.

Impact: Equipped stakeholders with a reliable churn prediction system, enabling targeted interventions and improved customer retention.

View on GitHub

🎯 Lead Scoring Model

Tech Stack: Python, Scikit-learn, Pandas, Seaborn, Matplotlib

Built a machine learning model to identify and prioritize marketing leads with the highest potential to convert, helping businesses optimize their sales funnel.

Impact: Enabled targeted sales efforts by automating lead prioritization, thereby reducing time-to-conversion and improving ROI.

View on GitHub

🚗 Vehicle Dataset - Advanced EDA & Insights

Tech Stack: Python, Pandas, Seaborn, Matplotlib, Plotly

Conducted an in-depth Exploratory Data Analysis (EDA) on a vehicle dataset to uncover trends, patterns, and actionable insights in the automotive space.

Impact: Revealed strong correlations between engine size, brand, and price; highlighted the impact of fuel type on efficiency; and created a foundation for future predictive modeling.

View on GitHub

🚴‍♂️ Bike Sharing Demand Prediction – Linear Regression Model

Tech Stack: Python, Pandas, Seaborn, Matplotlib, Scikit-learn

Built a linear regression model to predict bike rental demand based on historical usage patterns and environmental conditions.

Impact: Provided an interpretable model that helps stakeholders anticipate rental demand and make informed operational decisions such as fleet planning and staff allocation.

View on GitHub

💳 Credit Data Analysis – Deep Dive into Borrower Behavior

Tech Stack: Python, Pandas, Seaborn, Matplotlib, Plotly

Performed a comprehensive Exploratory Data Analysis (EDA) on a credit dataset to understand patterns related to credit risk, borrower profiles, and loan characteristics.

Impact: Enabled a clearer understanding of high-risk segments, borrower behavior, and key attributes associated with loan defaults—laying groundwork for future predictive modeling.

View on GitHub