Lead Software Engineer • Oct, 2024 — Present
LendingClub acquired Tally in 2024 making me the de-facto expert on Tally's network, application, and CI/CD infrastructure. I have worked to integrate Tally applications into LendingClub, while also hosting training sessions to spread the lessons learned from Tally's smaller, more agile environment.
Key Achievements:
- Modernize Terraform code, patterns, and management with Terragrunt and Atlantis.
- Onboard Tally applications and adapt them to LendingClub.
- Standardize EKS scaling mechanisms across all clusters, with Karpenter and EKS Auto-Mode.
- Work with Developers and Architects to identify common infrastructure patterns that can be reinforced and simplified to securely empower developers without requiring them to become infrastructure experts.
Staff Site Reliability Engineer • Sep, 2021 — Aug, 2024
I was a driving force toward standardization and automation, reducing our tech footprint while working with security and developers to ensure that our IaC was secure-by-default, easy to use, and adaptable to new requirements.
Key Achievements:
- Migrate infrastructure management from CloudFormation, Ansible, Puppet, and Pulumi to Terraform.
- Create secure-by-default Terraform modules that allow backend engineers to easily create common resources without requiring knowledge of our security and compliance requirements.
- Implement Karpenter for dynamic scaling of EKS clusters.
- Create new CDE with locked down egress using Kyverno, ECR pull through proxy, and AWS Network Firewall.
- Implement Teleport to grant developers temporarily elevated privileges.
- Standardize image builds to simplify dependency management.
- Work with security team to discover, audit, and secure cloud resources.
- Implement Renovate for automated dependency management across the company.
- Standardize and secure VPC peering and routing.
Lead Site Reliability Engineer • Apr, 2019 — Sep, 2021
I was primarily focused on taming the organic growth of our infrastructure; developing, documenting, and enforcing best practices to reduce barriers to growth, empower developers, and speed up SRE onboarding. I also helped mentor new and junior team members as the team grew from 4 to 15.
Key Achievements:
- Migrate QA environments from the datacenter to AWS
- Standardize terraform and implement terragrunt
- Build QA and production environments for the NiftyGateway acquisition
- Build production-like environment for load testing
- Develop patterns and supporting tools for ansible and terraform that allowed us to easily scale to additional regions
- Import manually created AWS resources to terraform
- Standardize routing, DNS, firewall, consul, and QA environment configuration
Sr. Service Reliability Engineer • May, 2018 — Apr, 2019
As a member of the tools team I was responsible for internal tooling and services. My primary responsibilities were managing kubernetes and its supporting infrastructure, automating manual tasks wherever I found them, and formalizing best practices to further support automation.
Key Achievements:
- Develop Kubernetes CI/CD pipeline and associated workflow
- Develop best practices for containerizing Jenkins builds
- Automate build pipeline for internal base containers
- Design automated patching process for bare-metal production, non-production, and development servers
Sr. Linux Administrator • Sep, 2013 — Jan, 2018
On a two person team I manage ~120 physical and virtual nodes running ~10 Rails apps with Chef, Etch, and Terraform. I am also primarily responsible for managing the network infrastructure, logging and metrics monitoring systems. Due to the small size of the ops team, a big part of my job is to empower developers through improving existing troubleshooting tools, creating new tools, or find better commercial/open source ones.
Key Achievements:
- Setup Terraform to manage Github, AWS, and PagerDuty
- Automate deb and rpm package builds for multiple operating systems and architectures.
- Setup Amazon VPC and Direct Connect to begin gradual migration toward Amazon's EC2, RDS, Elasticache, and Redshift.
- Redesign network infrastructure, and replace firewalls to improve security and throughput.
- Implement service discovery in Consul to ease service management and deployment.
- Design infrastructure in a self-service manner, granting developers easy control of settings without requiring knowledge of the infrastructure itself.
Tcpdump advanced query builder
The Tcpdump advanced syntax is incredibly powerful, and also incredibly difficult to write by hand, Wireshark has an online tool for generating these queries, but it is severely limited. This tool allows you to use wildcard syntax to search for packets containing a string with a given offset, and even has a concept of applications so you don't need to know the offset for common protocols.
Timeout testing tool
This is a tool similar to slowloris that I wrote primarily to explore how different servers and clients handle delays in different phases of the connection cycle and how much their configuration options really affect those timeouts.
Personal Blog
I started this blog mostly as a way to play with Jekyll/Liquid, but also because I feel like too many resources online talk about money and computers in ways that make them hard to understand, and I wanted to take a stab at a few topics in an approachable way.
Bachelor of Science, Business Administration
Information Systems • Cum Laude