Athithyaraagul Profile
SYSTEM STATUS: ONLINE

Engineering
Reliable Data Flow

I am a Data Engineer and final-year Computer Science undergraduate with a production-grade mindset. I specialize in building scalable ETL pipelines and cloud-native architectures.

My experience involves designing fault-tolerant data workflows on Google Cloud Platform (Dataflow, BigQuery) and AWS. I leverage proficiency in Python, SQL, and Bash to automate operational workflows and optimize data ingestion for high-performance analytics.

6
Pipelines
GCP
Cloud
100%
Automation
user_profile.json
{
  "candidate": "Athithyaraagul Sureshkumar",
  "status": "Final Year Undergrad",
  "focus": "Scalable Data Systems",
  "stack": {
    "languages": ["Python", "SQL", "Bash"],
    "cloud": ["GCP", "AWS"],
    "big_data": ["BigQuery", "PySpark", "Dataflow", "Dataproc", "Pub/Sub"]
  },
  "education": {
    "major": "Computer Science & Business System",
    "graduating": 2026
  },
  "location": "Chennai"
}

Runtime History

Data Analyst Intern

Corporate Gurukul (National University of Singapore & AWS)
June 2023 - July 2023 Executed Successfully
  • Collaborated with an Agile team to analyze large datasets and conduct exploratory analysis, identifying high-impact ML opportunities that aligned with product goals.
  • Developed and deployed scalable ML models using AWS SageMaker, achieving >90% performance validation within a production-oriented environment.
  • Communicated technical insights to stakeholders and engineering partners, ensuring alignment with business objectives and accelerating development cycles.

Pipeline Architectures

Visualizing the data flow of my key production-grade projects.

// displaying active_jobs: 6

Streaming ETL Pipeline

REAL-TIME

Architected a serverless streaming pipeline to ingest JSON events. Automated via gcloud CLI with IAM security.

Source
Pub/Sub
Transform
Dataflow (Beam)
Storage
BigQuery

Airflow Dataproc Serverless

SERVERLESS AUTOMATED

End-to-end automated batch pipeline orchestrated with Cloud Composer (Airflow). Executes ephemeral Spark workloads on Dataproc Serverless.

Orchestrate
Airflow
Execute
Dataproc (Spark)
Storage
GCS Data Lake

GCP Batch Ingestion

BATCH
view_source_code ->

Ingested CSVs from Storage to BigQuery with a JavaScript transformation layer for schema enforcement.

GCS JavaScript BigQuery

Hadoop Orchestration

ON-PREM
view_source_code ->

Distributed pipeline using Bash/Cron for zero-touch scheduling and Hive Partitioning for optimization.

Bash Hive HDFS

Oral Cancer Detector

AI/ML
view_source_code ->

ResNet-50 Deep Learning model trained for medical prediction. Deployed on AWS SageMaker with scalable cloud endpoints.

ResNet50 SageMaker MedTech

Observability Stack

DEVOPS
view_source_code ->

End-to-end monitoring solution using a custom Python exporter. Configured Prometheus scraping and Grafana dashboards.

Exporter Prometheus Grafana