Angela Yi

Senior Site Reliability Engineer


Location

Texas

Intro

I'm a Senior Site Reliability Engineer at Cisco (Splunk) who sits at the intersection of DevOps and platform engineering. I build the tools and infrastructure that make other engineers' lives easier -- from CI/CD pipelines to internal developer platforms. When something breaks at 2am, I'm probably already looking at it.

Expertise

01

Platform Engineering

Building and maintaining internal developer platforms that streamline deployments, reduce toil, and empower engineering teams to ship faster with self-service tooling.

02

Infrastructure Automation

Designing and implementing infrastructure as code with Terraform, Puppet, and CI/CD pipelines to minimize manual intervention and increase deployment reliability.

03

Production Reliability

Monitoring systems and infrastructure to ensure production operability, serving on-call for critical environments, and driving rapid incident resolution to minimize downtime.

04

Containerization & Orchestration

Running production workloads on Kubernetes and Docker, working with Kubernetes operators for Splunk Cloud, maintaining Git repositories and Dockerfiles, and promoting container-first workflows across engineering teams.

05

Cloud & Security

Working within AWS and FedRAMP/GovCloud environments, managing secrets and access policies with HashiCorp Vault, and collaborating with security teams to address vulnerabilities in regulated infrastructure.

06

Tooling & Automation

Developing custom tools in Python, Go, and Bash to automate operational tasks -- from Slack bots for on-call reminders to bulk Jira ticket creation. Focused on reducing human error and building documented, repeatable processes for team adoption.

Skills & Software

90%

Kubernetes / Docker

90%

Bash / Linux

80%

Python

85%

Terraform / HCL

85%

CI/CD (Jenkins / GitLab)

80%

AWS

80%

Go

90%

Splunk

85%

Puppet

80%

HashiCorp Vault

Experience

2023 – Present

Splunk > a Cisco company

Senior Site Reliability Engineer

FedRAMP SRE | Remote

Build and maintain internal developer platforms and infrastructure tooling for FedRAMP environments. Manage secrets infrastructure with HashiCorp Vault, including policy authoring and role generation. Author reusable Terraform modules for core and network infrastructure, and maintain Puppet hieradata across GovCloud and FedRAMP stacks. Develop custom Go services and Kubernetes operator workflows. Build internal automation tools including Slack bots and bulk Jira integrations to streamline team operations.

2021 – 2023

Splunk

Site Reliability Engineer

Techops (GovCloud/FedRAMP) | Remote

Provided engineering support for production and staging environments to maintain 100% operability. Developed automation tools in Python, Go, and JavaScript with GitLab and Docker integration. Created training curricula and onboarded new hires through daily one-on-one shadowing. Built Splunk dashboards to track SLA metrics and delegated workloads across the team. Collaborated with security teams to address vulnerabilities within the GovCloud space.

2019 – 2021

Splunk

Technical Support Engineer

Support | Remote

Provided technical support and troubleshooting for enterprise customers, using Splunk SPL to investigate issues on customer stacks. Served on-call for high-priority cases with quick turnaround on resolutions. Created internal bug tickets with thorough documentation for dev teams and collaborated across Account and Sales teams to ensure customer success.

2018 – 2019

IBM

Site Reliability Engineer

Dallas, TX

Monitored transactions on virtual and bare metal server provisions and reloads. Served as an escalation point for Systems Administrators and Engineers. Deployed and maintained international server environments for 24/7 critical uptime in a mixed Windows/Linux environment. Leveraged automation tools to decrease deployment times and increase reliability. Managed on-call support for critical business applications and maintained complete system inventory.