Hi, I'm Hyma Roshini

Data Engineer | Data Scientist | Data Analyst

Passionate about building end-to-end ML systems, scalable data pipelines, and AI-driven solutions that transform data into impactful business decisions.

About Me

Hyma Roshini Gompa

Hello! I'm Hyma Roshini Gompa, a Data Scientist with 4+ years of experience delivering end-to-end predictive analytics and machine learning solutions for high-volume consumer platforms.

I specialize in building scalable data pipelines, developing and deploying ML models (Random Forest, XGBoost, LightGBM), and translating complex data into actionable business insights through customer segmentation, retention strategies, and KPI-driven dashboards.

My expertise spans across cloud data engineering, MLOps, and generative AI, with a proven track record of driving measurable impact on churn reduction and revenue optimization. I'm passionate about leveraging data and AI to solve complex business problems and deliver value at scale.

Skills

Programming & Languages

Python SQL Apache Spark PySpark

Data Engineering & Cloud

Scalable Data Pipelines ETL/ELT Snowflake Azure Data Factory Databricks Data Modeling Feature Engineering

Machine Learning & AI

Predictive Modeling Logistic Regression Random Forest XGBoost LightGBM Customer Segmentation Churn Analytics

Generative AI & LLMs

Large Language Models RAG (Retrieval-Augmented Generation) LangChain Legal-BERT Transformer Models

MLOps & Deployment

Docker Kubernetes Model Monitoring Production ML Pipelines Kafka

BI & Visualization

Power BI Tableau Executive Dashboards KPI Frameworks Streamlit

Experience

Data Scientist

Enigma Technologies | USA

May 2025 - Present

  • Own flagship end-to-end predictive analytics initiative for proactive customer retention, analyzing large-scale behavioral and transactional data using SQL, Apache Spark, and Snowflake
  • Design and maintain scalable, production-grade data pipelines using Python, SQL, and Spark with data validation, feature engineering, and quality checks
  • Implement advanced ML pipelines using Logistic Regression, Random Forest, and XGBoost with hyperparameter tuning to improve precision & ROC-AUC metrics
  • Deploy automated model scoring and monitoring workflows leveraging Docker and Kubernetes for near real-time churn risk visibility
  • Apply Large Language Models (LLMs) by implementing RAG pipelines using LangChain to enhance insight retrieval from internal documents
  • Deliver executive-ready insights through customer segmentation and KPI-driven dashboards using Power BI and Tableau

Data & Cloud Analytics

Trulogik | USA

Aug 2024 - May 2025

  • Led end-to-end analytics delivery as primary liaison between business stakeholders and engineering teams, cutting reporting cycle time by 35%
  • Designed and owned executive-ready Power BI and Tableau dashboards tracking critical KPIs with deep trend and root-cause analysis
  • Built and automated backend analytics workflows using SQL and Python, reducing reporting errors by 20%
  • Implemented proactive data quality monitoring and anomaly detection ensuring high accuracy and on-time delivery
  • Standardized reporting logic, data models, and documentation for repeatable analytics processes

Data Engineer Technical Program Analyst

Josh Technology Group | India

Apr 2021 - Jul 2023

  • Designed and automated scalable data pipelines integrating multi-source datasets using SQL and Python, reducing manual effort by 30%
  • Developed complex SQL transformations and analytical queries for recurring and ad hoc reporting with trend and root-cause analysis
  • Built production-grade Power BI and Tableau dashboards for KPI visualization supporting executive decision-making
  • Implemented data quality frameworks including duplicate detection and validation workflows, cutting discrepancies by 25-30%
  • Partnered with cross-functional stakeholders to gather requirements and define standardized metrics
  • Documented end-to-end data flows, transformation logic, and validation rules for governance and compliance

Education

Master of Science in Data Science

University of Maryland, Baltimore County

Aug 2023 - May 2025

Baltimore, MD

Bachelor of Science in Computer Science

Lovely Professional University

Aug 2018 - May 2022

Punjab, India

Certifications

  • Microsoft Power BI
  • Microsoft Azure Fundamentals
  • SQL (Intermediate-Advanced)
  • Python for Data Analytics

Projects

Data Engineering

Real-Time Voting System

Designed and implemented a real-time voting platform processing 500K+ events per minute using Kafka, Spark Streaming, and PostgreSQL with optimized throughput and live analytics visualization.

Kafka Spark Streaming PostgreSQL Streamlit Python

Big Data Analytics Pipeline

Built scalable big data processing pipeline handling massive datasets using distributed computing frameworks, optimizing data processing workflows for improved performance and efficiency.

Apache Spark Hadoop PySpark Distributed Computing ETL

Machine Learning & AI

AI-Powered Legal Document Summarization

Built a transformer-based legal document summarization platform using Legal-BERT, LED-16384, and RAG, automating multi-format document processing and reducing review time by 70%.

Legal-BERT LED-16384 RAG Streamlit NLP ROUGE/BLEU

End-to-End Data Science & ML Pipeline

Developed a comprehensive end-to-end machine learning pipeline with data preprocessing, feature engineering, model training, hyperparameter tuning, and deployment using MLOps best practices.

Python Scikit-learn MLOps Feature Engineering Model Deployment

Anomaly Detection Using Robust Graphical Lasso

Implemented advanced anomaly detection system using Robust Graphical Lasso for learning sparse precision matrices, identifying outliers in high-dimensional data with improved accuracy.

Python Graphical Lasso Anomaly Detection Statistical Learning NumPy

Data Analytics

T20 Cricket Player Performance Analysis

Conducted comprehensive statistical analysis of T20 cricket player performance using data analytics and visualization techniques to derive insights on player efficiency and match outcomes.

Python Pandas Data Visualization Statistical Analysis Sports Analytics

Get In Touch

Let's Connect

I'm always open to new opportunities and interesting conversations. Feel free to reach out!