Hi, I'm Bilal Momin

Data Science Consultant

I'm a Data Scientist and AI Consultant with 4+ years of experience in building scalable machine learning systems, intelligent data pipelines, and LLM-powered applications. I specialize in creating end-to-end GenAI solutions—from real-time web crawling and feature engineering to deploying LLM-based tools like resume builders, Q&A bots, and document intelligence systems.

I’ve worked across industries including finance, education, and media tech—combining my skills in Python, cloud (GCP, AWS), LangChain, Kafka, and ML frameworks to deliver impactful, production-ready tools. Passionate about solving real-world problems with AI, I focus on building solutions that are scalable, smart, and user-friendly.

Bilal Ayyas Momin

Technical Skills

Data Engineering

Python SQL Apache Spark PySpark Airflow Kafka BigQuery Google Cloud Platform AWS ETL Pipelines Data Warehousing Docker Kubernetes MongoDB PostgreSQL

AI & ML

Machine Learning Deep Learning TensorFlow PyTorch Scikit-learn NLP LLMs LangGraph RAG (Retrieval-Augmented Generation) LangChain Prompt Engineering OpenAI Transformers Fine-tuning

Cloud & DevOps

Google Cloud Platform (GCP) AWS Cloud Functions Cloud Run Docker Kubernetes CI/CD GitHub Actions Cloud Storage Compute Engine Fargate Cloud Monitoring

Professional Journey

Freelance Data Science Consultant

Self-employed

June 2023 - Present

Designed and delivered AI-powered tools and applications using LLMs, LangChain, and cloud technologies. Focused on creating scalable, real-time NLP systems for document intelligence, resume automation, and financial analysis.

Key Achievements:
  • Built an LLM-based Document QA system using LangChain and ChromaDB for natural language interaction across PDF, CSV, and XLSX files
  • Developed a portfolio Q&A assistant allowing users to query financial data and stock performance using RAG pipelines and OpenAI
  • Created an LLM-powered resume generator that converts plain text input into polished, personalized resumes
  • Developed a chatbot for querying structured financial datasets with high retrieval accuracy using custom embeddings
  • Handled all aspects from ingestion and preprocessing to embedding storage, vector search, and LLM orchestration

Data Engineer

Admazes Limited, Hong Kong

December 2021 - Present

Led the development of scalable data pipelines, automated data ingestion systems, and ML classification tools across high-volume data sources. Focused on building reliable infrastructure and delivering actionable business insights through robust engineering and modeling practices.

Key Achievements:
  • Developed and maintained data crawlers for Quora, Reddit, Twitter, LinkedIn, and TOR—processing millions of records with real-time error handling and deduplication
  • Engineered ETL pipelines that handled 100M+ monthly Google SERP records for trend forecasting and demographic analysis using clustering and regression models
  • Built a topic classification engine using ML models (e.g., fine-tuned BERT) for organizing search queries into categories with high accuracy
  • Performed complex data wrangling, cleaning, and transformation across diverse data formats (JSON, text, tabular, unstructured)
  • Managed API deployments and database hosting on Google Cloud using Docker, Kubernetes, and Compute Engine

Python Developer

Codemarket, California

December 2020 - February 2021

Contributed to backend development and cloud-based deployments of modern web applications, supporting real-time API workflows and data integrations.

Key Achievements:
  • Built RESTful and GraphQL APIs using Python and MongoDB for high-performance frontend consumption
  • Deployed scalable microservices using AWS Lambda, Fargate, and ECR within a containerized architecture
  • Collaborated across teams and managed multiple GitHub repositories simultaneously
  • Supported frontend-backend integration using React and ensured consistent data flow between services

Featured Projects

University Q&A System with LLM & RAG

University Q&A System with LLM & RAG

Built an LLM-powered Retrieval-Augmented Generation (RAG) system that allows users to ask natural language questions about 100+ universities. Integrated data from Quora and Reddit using scalable crawlers. Designed chunking, metadata tagging, and vector search using LangChain and OpenAI to return accurate, context-aware responses.

LangChain RAG OpenAI ChromaDB Python MongoDB Vector Search
View Project
Search Query Trend Prediction Platform

Search Query Trend Prediction Platform

Engineered a data pipeline to process 100M+ monthly Google SERP records and forecast keyword trends using clustering and regression models. Handled large-scale data ingestion, transformation, and analytics for business insights.

ETL BigQuery Dataflow ML Python Scikit-learn GCP
View Project
ML-Based Topic Categorization Engine

ML-Based Topic Categorization Engine

Created an end-to-end machine learning pipeline to classify search queries into relevant marketing categories. Implemented text preprocessing, model training using fine-tuned BERT, and deployed real-time prediction APIs on Google Cloud. Used for search analytics and content tagging.

ML Text Classification BERT Python Google Cloud APIs
View Project