Pipelines built to hold.
I design data pipeline architecture across Google Cloud and Azure — built to move data reliably, recover from failure, and run unattended. Python, SQL, and Docker underneath every layer.
The
Work
Ten production-grade pipelines — streaming and batch, cloud and on-prem. Every one is live on GitHub.10 repos · source available
End-to-end auditing pipeline that lands infrastructure and security telemetry, transforms it in Data Factory, and serves it from Synapse serverless SQL.
Logs → ADF → Synapse
Serverless streaming pipeline ingesting JSON events through Apache Beam, secured with IAM and deployed via the gcloud CLI.
Pub/Sub → Dataflow → BigQuery
Batch pipeline orchestrated by Cloud Composer, running ephemeral Spark jobs on Dataproc Serverless — zero idle infrastructure between runs.
Airflow → Dataproc (Spark) → GCS
Event-driven Change Data Capture keeping a target store synced row-for-row with its source as changes happen.
Source DB → Stream → Target
CSV ingestion into BigQuery with a JavaScript UDF layer enforcing schema on the way in.
GCS → JS UDF → BigQuery
Slowly Changing Dimension Type 2 modeling that preserves complete history through pure SQL transformations.
Source → SQL → SCD2 target
Distributed incremental loads scheduled with Bash and Cron, tuned with Hive partitioning for query performance.
Bash/Cron → Hive → HDFS
A custom Python exporter feeding Prometheus, with Grafana dashboards for end-to-end pipeline metrics.
Exporter → Prometheus → Grafana
ResNet-50 deep learning model for oral cancer detection, deployed on AWS SageMaker endpoints.
ResNet-50 → SageMaker → Endpoint
Interactive dashboard parsing high-resolution Formula 1 telemetry into driver-level metrics.
FastF1 API → Python → Dashboard
The
Stack
The tools I reach for to ingest, transform, orchestrate and serve data — picked for the job, not the logo.
Languages
Google Cloud
Microsoft Azure
Processing & Storage
Orchestration & Ops
Analytics & BI
Run
History
Where the work has run so far.
● completed
Data Analyst Intern
Corporate Gurukul — in collaboration with the National University of Singapore & AWS
- Worked in an Agile team running exploratory analysis on large datasets to surface high-impact ML opportunities aligned with product goals.
- Built and deployed ML models on AWS SageMaker, validating above 90% in a production-oriented environment.
- Translated technical findings for stakeholders, keeping engineering work tied to business objectives and shortening the feedback loop.