SYSTEM STATUS: ONLINE

Engineering
Reliable Data Flow

I am a Data Engineer and final-year Computer Science undergraduate with a production-grade mindset. I specialize in building scalable ETL pipelines and cloud-native architectures.

My experience involves designing fault-tolerant data workflows on Google Cloud Platform (Dataflow, BigQuery) and AWS. I leverage proficiency in Python, SQL, and Bash to automate operational workflows and optimize data ingestion for high-performance analytics.

Pipelines

GCP

Cloud

100%

Automation

user_profile.json

{
  "candidate": "Athithyaraagul Sureshkumar",
  "status": "Final Year Undergrad",
  "focus": "Scalable Data Systems",
  "stack": {
    "languages": ["Python", "SQL", "Bash"],
    "cloud": ["GCP", "AWS"],
    "big_data": ["BigQuery", "PySpark", "Dataflow", "Dataproc", "Pub/Sub"]
  },
  "education": {
    "major": "Computer Science & Business System",
    "graduating": 2026
  },
  "location": "Chennai"
}

Runtime History

Data Analyst Intern

Corporate Gurukul (National University of Singapore & AWS)

June 2023 - July 2023 Executed Successfully

▹ Collaborated with an Agile team to analyze large datasets and conduct exploratory analysis, identifying high-impact ML opportunities that aligned with product goals.
▹ Developed and deployed scalable ML models using AWS SageMaker, achieving >90% performance validation within a production-oriented environment.
▹ Communicated technical insights to stakeholders and engineering partners, ensuring alignment with business objectives and accelerating development cycles.

Pipeline Architectures

Visualizing the data flow of my key production-grade projects.

// displaying active_jobs: 6

Streaming ETL Pipeline

REAL-TIME

Architected a serverless streaming pipeline to ingest JSON events. Automated via gcloud CLI with IAM security.

view_source_code ->

Source

Pub/Sub

Transform

Dataflow (Beam)

Storage

BigQuery

Airflow Dataproc Serverless

SERVERLESS AUTOMATED

End-to-end automated batch pipeline orchestrated with Cloud Composer (Airflow). Executes ephemeral Spark workloads on Dataproc Serverless.

view_source_code ->

Orchestrate

Airflow

Execute

Dataproc (Spark)

Storage

GCS Data Lake

GCP Batch Ingestion

BATCH

view_source_code ->

Ingested CSVs from Storage to BigQuery with a JavaScript transformation layer for schema enforcement.

GCS → JavaScript → BigQuery

Hadoop Orchestration

ON-PREM

view_source_code ->

Distributed pipeline using Bash/Cron for zero-touch scheduling and Hive Partitioning for optimization.

Bash → Hive → HDFS

Oral Cancer Detector

AI/ML

view_source_code ->

ResNet-50 Deep Learning model trained for medical prediction. Deployed on AWS SageMaker with scalable cloud endpoints.

ResNet50 → SageMaker → MedTech

Observability Stack

DEVOPS

view_source_code ->

End-to-end monitoring solution using a custom Python exporter. Configured Prometheus scraping and Grafana dashboards.

Exporter → Prometheus → Grafana

Engineering Reliable Data Flow

Runtime History

Data Analyst Intern

Pipeline Architectures

Streaming ETL Pipeline

Airflow Dataproc Serverless

GCP Batch Ingestion

Hadoop Orchestration

Oral Cancer Detector

Observability Stack

Engineering
Reliable Data Flow