pipeline://athithya0%
skip ↵
Data Engineer — Chennai, India

Pipelines built to hold.

sourcetransformsink

I design data pipeline architecture across Google Cloud and Azure — built to move data reliably, recover from failure, and run unattended. Python, SQL, and Docker underneath every layer.

Athithyaraagul Sureshkumar
10 pipelines shipped
2 clouds — GCP + Azure
DP-900 certified

The
Work

Ten production-grade pipelines — streaming and batch, cloud and on-prem. Every one is live on GitHub.10 repos · source available

01
Azure Audit PipelineAzure

End-to-end auditing pipeline that lands infrastructure and security telemetry, transforms it in Data Factory, and serves it from Synapse serverless SQL.

Logs ADF Synapse

Source
02
Streaming ETL PipelineGCPReal-time

Serverless streaming pipeline ingesting JSON events through Apache Beam, secured with IAM and deployed via the gcloud CLI.

Pub/Sub Dataflow BigQuery

Source
03
Airflow + DataprocGCPServerless

Batch pipeline orchestrated by Cloud Composer, running ephemeral Spark jobs on Dataproc Serverless — zero idle infrastructure between runs.

Airflow Dataproc (Spark) GCS

Source
04
CDC Streaming PipelineReal-time

Event-driven Change Data Capture keeping a target store synced row-for-row with its source as changes happen.

Source DB Stream Target

Source
05
GCP Batch IngestionGCPBatch

CSV ingestion into BigQuery with a JavaScript UDF layer enforcing schema on the way in.

GCS JS UDF BigQuery

Source
06
SQL SCD2 PipelineWarehouse

Slowly Changing Dimension Type 2 modeling that preserves complete history through pure SQL transformations.

Source SQL SCD2 target

Source
07
Hadoop IncrementalOn-prem

Distributed incremental loads scheduled with Bash and Cron, tuned with Hive partitioning for query performance.

Bash/Cron Hive HDFS

Source
08
Observability StackDevOps

A custom Python exporter feeding Prometheus, with Grafana dashboards for end-to-end pipeline metrics.

Exporter Prometheus Grafana

Source
09
OSCC DetectorAI / ML

ResNet-50 deep learning model for oral cancer detection, deployed on AWS SageMaker endpoints.

ResNet-50 SageMaker Endpoint

Source
10
F1 Telemetry DashboardAnalytics

Interactive dashboard parsing high-resolution Formula 1 telemetry into driver-level metrics.

FastF1 API Python Dashboard

Source

The
Stack

The tools I reach for to ingest, transform, orchestrate and serve data — picked for the job, not the logo.

Languages

PythonSQLBash

Google Cloud

DataflowBigQueryPub/SubDataprocCloud ComposerGCS

Microsoft Azure

Data FactorySynapseDatabricksADLS Gen2Azure OpenAI

Processing & Storage

PySparkApache BeamDelta LakeMedallionHadoop / Hive

Orchestration & Ops

AirflowDockerGitHub ActionsPrometheusGrafana

Analytics & BI

Power BIFastF1SageMaker
DP-900 Certified Microsoft Azure Data Fundamentals

Run
History

Where the work has run so far.

Jun — Jul 2023
● completed

Data Analyst Intern

Corporate Gurukul — in collaboration with the National University of Singapore & AWS

  • Worked in an Agile team running exploratory analysis on large datasets to surface high-impact ML opportunities aligned with product goals.
  • Built and deployed ML models on AWS SageMaker, validating above 90% in a production-oriented environment.
  • Translated technical findings for stakeholders, keeping engineering work tied to business objectives and shortening the feedback loop.