About Me
Hi, I’m Saai Sudarsanan — DevOps Engineer | Cloud Native Enthusiast | Building reliable, observable, and scalable systems — one cluster at a time.
Timeline
View Full Timeline →2025: Promoted to Software Engineer I. Led Swym’s AI Transformation efforts — introduced AI-driven CI/CD assistants, Slack deployment bots, and spearheaded complete infrastructure modernization across clusters.
2024: Designed a robust observability and tracing ecosystem using Jaeger, Kafka, and OpenTelemetry. Drove infra cost optimization, redefined cluster reliability, and mentored juniors on Kubernetes operations.
2023: Transitioned from intern to full-time Associate Software Engineer. Migrated major services to Kubernetes, built CI/CD pipelines with ArgoCD, Jenkins, and Helm, and initiated secure secret management with Azure Key Vault.
Major Achievements
AI Transformation & Automation Leadership
Led Swym’s AI initiative across engineering — built Slack-based deployment assistants, explored LangGraph and LangChain for CI/CD improvements, and drove operational automation across clusters.
Kubernetes Modernization & Migration
Planned and executed migration of core production workloads to next-gen AKS clusters with workload identities, VNet isolation, App Gateways, and Helm v2 charts — improving scalability and reducing deployment risk.
Observability and Tracing Stack
Architected a unified observability platform with Jaeger, OpenSearch, FluentBit, and Kafka — delivering distributed tracing and real-time alerting, reducing debugging time by 60%.
Build Pipeline Optimization
Reduced Jenkins build times by 75% through multi-stage Docker builds, caching strategies, and workload parallelization — improving developer productivity and CI/CD reliability.
Security, Compliance, and Reliability
Implemented SOC2-aligned secret management with Azure Key Vault, led vulnerability remediation from penetration testing reports, and introduced secure ingress policies with Azure AD-based RBAC.
Documentation & Mentorship
Conducted 10+ KT sessions on Kubernetes, Helm, Jenkins, and Observability. Authored internal developer handbooks and playbooks for production-grade deployments and incident response.
Cost Optimization & Reliability
Reduced infra costs by moving workloads to spot instances, introduced backup automation, and enhanced monitoring precision.
Handling High-Scale Production Traffic
Handled peak production traffic of 1M+ RPM during BFCM 2025 with all services running on Kubernetes, maintaining stability through careful capacity planning, observability, and operational discipline.
Skills
Certifications
- Certified Kubernetes Administrator (CKA)
- Certified Ethical Hacker (CEH)
- ISC2: Certified in Cybersecurity
- Google Cybersecurity Specialization
- Data Science Professional Certificate
- Python for Everybody Specialization
- Competitive Strategy
- Advanced Competitive Strategy
- Strategic Organization Design
Notable Projects
ilogu3000
A benchmarking suite for Clojure logging frameworks — identified CPU inefficiencies in pretty-printing and optimized log formatting.
turinglib
turinglib is a lightweight Python library for building, simulating, and experimenting with Turing Machines — the foundational model of computation. It provides an object-oriented interface for defining tape symbols, actions, states, and state machines, allowing you to express theoretical computation models in a clear and testable way.
Featured Blogs
Setup Jaeger Operator with Opensearch for Kubernetes
A hands-on guide for integrating tracing and observability using Jaeger and OpenSearch in a K8s cluster.
Medium
What are Convolutional Neural Networks?
Breaking down CNNs — the architecture powering modern computer vision.
Medium
HTTPS: Built on Trust Issues
Demystifying HTTPS and SSL — understanding why the internet’s security model is both genius and flawed.
Midnight Mayhem: Lessons from a 2 AM Outage
A real-world account of debugging an outage — lessons on reliability, calm under pressure, and team coordination.
Substack
BFCM 2025 - A Magical time for Shoppers, Merchants and Engineers
What it took to run production systems on Kubernetes through BFCM 2025.
Substack