Machine LearningCase Study·94% Match

Data Science Salary Prediction Platform

End-to-end MLOps pipeline: Apache Kafka streaming, Airflow orchestration, PySpark processing, Snowflake warehouse, Redis caching, PostgreSQL, and React frontend with Flask API. Deployed on AWS EC2.

GitHub ↗

PythonSnowflakePySparkKafkaAirflowRedisFlask

Production50K+Records/day89%Accuracy120msP99 latency48hRetrain cycle★ Featured

50K+

Records/day

89%

Accuracy

120ms

P99 latency

48h

Retrain cycle

🔴 The Problem

▸

Job seekers lacked real-time salary benchmarks

▸

Existing tools used stale data and simple regression models

✅ The Solution

▸

Kafka streams live salary data at 50K records/day

▸

Spark processes features at scale; MLflow tracks experiments

📈 Impact & Results

▸

89% accuracy within $5K range on holdout set

▸

Retraining cycle: 2 weeks → 48 hours

▸

Sub-120ms prediction latency at p99

Full Tech Stack

PythonSnowflakePySparkKafkaAirflowRedisFlaskReactDockerPostgreSQLAWS EC2Firebase

More Projects

Interested in working together?

Let's build something impactful.

Get in Touch →All Projects