About Me

Hi, I’m Saai Sudarsanan — DevOps Engineer | Cloud Native Enthusiast | Building reliable, observable, and scalable systems — one cluster at a time.

2025: Promoted to Software Engineer I. Led Swym’s AI Transformation efforts — introduced AI-driven CI/CD assistants, Slack deployment bots, and spearheaded complete infrastructure modernization across clusters.

2024: Designed a robust observability and tracing ecosystem using Jaeger, Kafka, and OpenTelemetry. Drove infra cost optimization, redefined cluster reliability, and mentored juniors on Kubernetes operations.

2023: Transitioned from intern to full-time Associate Software Engineer. Migrated major services to Kubernetes, built CI/CD pipelines with ArgoCD, Jenkins, and Helm, and initiated secure secret management with Azure Key Vault.

Major Achievements

AI Transformation & Automation Leadership

Led Swym’s AI initiative across engineering — built Slack-based deployment assistants, explored LangGraph and LangChain for CI/CD improvements, and drove operational automation across clusters.

Kubernetes Modernization & Migration

Planned and executed migration of core production workloads to next-gen AKS clusters with workload identities, VNet isolation, App Gateways, and Helm v2 charts — improving scalability and reducing deployment risk.

Observability and Tracing Stack

Architected a unified observability platform with Jaeger, OpenSearch, FluentBit, and Kafka — delivering distributed tracing and real-time alerting, reducing debugging time by 60%.

Build Pipeline Optimization

Reduced Jenkins build times by 75% through multi-stage Docker builds, caching strategies, and workload parallelization — improving developer productivity and CI/CD reliability.

Security, Compliance, and Reliability

Implemented SOC2-aligned secret management with Azure Key Vault, led vulnerability remediation from penetration testing reports, and introduced secure ingress policies with Azure AD-based RBAC.

Documentation & Mentorship

Conducted 10+ KT sessions on Kubernetes, Helm, Jenkins, and Observability. Authored internal developer handbooks and playbooks for production-grade deployments and incident response.

Cost Optimization & Reliability

Reduced infra costs by moving workloads to spot instances, introduced backup automation, and enhanced monitoring precision.

Handling High-Scale Production Traffic

Handled peak production traffic of 1M+ RPM during BFCM 2025 with all services running on Kubernetes, maintaining stability through careful capacity planning, observability, and operational discipline.

Skills

KubernetesDockerAzure Cloud (AKS, VMSS, VNet)HelmArgoCDTerraformLinuxJenkinsGitHub ActionsCI/CD AutomationInfrastructure as CodePrometheusGrafanaJaegerOpenTelemetryFluentd / FluentBitOpensearchIncident ResponseClojurePythonNode.jsBash / Shell ScriptingAppsmithNetworking & Load BalancingSecurity & ComplianceCost Optimization

Certifications

  • Certified Kubernetes Administrator (CKA)
  • Certified Ethical Hacker (CEH)
  • ISC2: Certified in Cybersecurity
  • Google Cybersecurity Specialization
  • Data Science Professional Certificate
  • Python for Everybody Specialization
  • Competitive Strategy
  • Advanced Competitive Strategy
  • Strategic Organization Design

Notable Projects

ilogu3000

A benchmarking suite for Clojure logging frameworks — identified CPU inefficiencies in pretty-printing and optimized log formatting.

turinglib

turinglib is a lightweight Python library for building, simulating, and experimenting with Turing Machines — the foundational model of computation. It provides an object-oriented interface for defining tape symbols, actions, states, and state machines, allowing you to express theoretical computation models in a clear and testable way.

Featured Blogs