Alan Janis
Scalable Infrastructure • Private Cloud • Distributed Storage •
Metrics Collection • Log Aggregation • Data Ingest • Monitoring •
Graphing • Alerting • Site Reliability • Full Stack Automation •
Performance Tuning • R&D
Background
I think more people should build a private cloud with
distributed & tiered storage, learn the tools to deploy it
and collect the data to see into it. At home. For fun.
I do this just to support everyday home automation and
entertainment - and because I love building cool things!
My career path began with Obsidian Hosting, a company I started to
provide simple deployment of multiplayer game servers, which I
bundled with voice chat and a variety of web hosting services. As
I grew that infrastructure on a budget, I came to appreciate the
open-source community. I repurposed old hardware for my home
network and started looking for reasons to integrate any project
that grabbed my attention. Not long after, I took a position with
a local web hosting company for the opportunity to learn from an
experienced team in a large-scale, high-uptime environment. In
2010, a senior role with Yahoo brought me to Colorado, where the
thriving dev community has continued to shape my personal and
professional relationships. In addition to the snow, rivers, and
mountains, 'building cool things' remains at the top of
my passions.
History
Senior Systems Engineer
Marriott
|
Mar 2023
- Sep 2023
(6 months)
Supporting CDN Team initiative to migrate Akamai rules to a 'configuration-as-code' standard.
Akamai
-
As part of the CDN (Akamai) Team, I was tasked with creating
scripts and workflows to assist our team in supporting our
internal customer needs.
-
Worked alongside our principal architect during early planning
stages to codify CDN configuration and bring existing and
future environments into sync and under revision control.
-
Rebuilt a suite of utilities used by dev/ops teams across the
business to shape and validate web traffic across development
and production environments.
-
Built new CDN purge utilities as part of a mentoring project
with a student intern. Over a series of paired working calls,
we created and packaged a CLI tool and an associated AWS
Lambda function that can be easily used by individuals or as
part of automation processes. While this was a relatively
simple script, we discussed and implemented common and
recommended tools for revision control (Git), dependency
management (Pyenv, Poetry), package management
(PyPI/Homebrew), and process automation (Ansible). The student
presented the project to our team's management chain as a
successful goal for the internship.
Consulting Solutions Architect
World Wide Technology
|
Jan 2016
- Mar 2023
(7 years 2 months)
Providing custom-tailored OpenStack and automation solutions for high-value clients.
OpenStack
Ceph
-
Worked on-site with a very high-value client who wished to
validate OpenStack reliability on an upcoming generation of
hardware and rethink their internal cloud design. Together we
developed tools to automate containerized deployment (pre-
Docker-OpenStack) and I guided the team in testing, deploying,
managing, and monitoring their OpenStack cloud. My unique,
intuitive, and maintainable design prompted immediate praise
from the client, for which I earned the WWT Global Service
Provider - Engineer of The Year Award.
-
Developed an automated build pipeline using Ansible Tower and
Foreman, which, given minimal specifications, allows our
technicians to deploy the resources necessary to provision and
test hundreds of systems each day; which are shipped to our
customers fully racked and cabled for immediate integration.
-
Created an environment for testing and metrics collection for
a prototype mesh storage solution. This utilized nVME drives
and remote direct disk access (over ethernet) to provide
ultra-low latency with ultra-high i/o operations.
-
Worked with the Principle Ceph Storage Engineer to better
understand the management, deployment, and performance tuning
of Ceph clusters.
-
Consulted the first cross-divisional project team in building
an automated solution to verify and remediate the hardware
configuration, health status, firmware, and operating system
version. Provided guidance related to setting up a PXE boot
environment for hardware and system configuration, automated
firmware remediation, and reporting.
Director of Technology
MassRoots
|
Jan 2015
- Jan 2016
(1 year)
Guide the creation of new API, services, and ETL processes to migrate application data from obsolete backend with minimal customer impact.
DevOps
-
Responsible for the redesign of the infrastructure to run the
mobile application, revamping the entire backend
infrastructure to a hybrid platform on AWS, OpenStack, and
KVM+Vagrant. My team was responsible for developing API
services using Node.js, Express.js, Redis, Percona XtradDB
Cluster, and Nginx to facilitate migration from
Facebook's Parse Mobile App Platform.
-
Built out monitoring and alerting systems with Icinga, ELK
stack, and TIG (Telegraf, InfluxDB, Grafana) stack.
-
Deployed Salt for configuration, control, self-healing, and
rapid deployment/scale of systems and services in a hybrid
cloud environment.
-
Technologies used for infrastructure include Percona XtraDB
Cluster, Salt, Icinga, Redis, Nginx, HAProxy, Node.js,
ElasticSearch, Jenkins, KVM, Vagrant, Amazon Web Services
(EC2, S3 Primarily), LDAP, PFSense, OpenVPN.
Principal OpenStack Engineer
Photobucket
|
Jan 2012
- Jan 2015
(3 years)
Architect and develop automation for OpenStack, deployed on Cisco UCS and NetApp hardare, and oversee the migration of all bare-metal services into new private cloud.
OpenStack
NetApp
DevOps
-
Worked with a small high-output team responsible for
maintaining site reliability while also transitioning all
significant services to an internal OpenStack cloud.
-
Principal engineer of private cloud environment using
OpenStack (Grizzly, Icehouse) on Ubuntu (12.04, 14.04
respectively) built on Cisco UCS B-Series blade hardware, and
backed by NetApp clustered NFS storage.
-
Shared responsibility for 22 Petabytes of storage, spread over
72 NetApp appliances (FAS6040, FAS3240, FAS3210, FAS2240,
FAS2020) running OnTap 7.1.1 - 8.1.1 in 7-Mode and
Clustered-Mode.
-
Developed custom storage balancing tools required to move
roughly 6 Petabytes of data out of NetApp TradVols and into
64-bit FlexVols. Managed rebalancing data across the existing
32-bit FlexVols to begin the upgrade process to 64-bit
FlexVols essential to the release and ramp-up of the site
redesign, given the data-ingest projections.
-
Headed a storage initiative to compare flash-based PureStorage
against NetApp's latest flash-based storage as the
backend for our newly virtualized databases. Technologies used
in this project allowed me to learn about fiber-channel
switching/zoning configuration on Cisco UCS 6200
Fabric-Interconnects, Linux FCoE, and multipathing, as well as
configuring the KVM virtual machines to boot from SAN.
Senior Systems Engineer
Yahoo
|
Jan 2010
- Jan 2012
(2 years)
Provide design expertise to modernize the application stack
while improving performance and reliability.
SRE
DevOps
-
Managed the deployment of the Yahoo! Contributor Network
including network and hardware specifications, system
provisioning, QA process, and launch.
-
Provided support to our in-house development team to leverage
new Yahoo! technology including their cloud computing platform
which now handles all Associated Content analytics and content
(content agility) to feed the content across Y!'s entire
collection of properties.
-
Responsible for 100% availability of Associated Content and
Yahoo! Contributor Network consisting of 340 RHEL5 servers.
-
Managed Yahoo Voices launch and affected changes to improve
scalability and automation.
-
Architected parallel code deployment system for dev, stage,
QA, and production environments using Git. This resulted in
full deploys in under 90 seconds and featured tiered-deploy
and full rollback-to-stable capabilities.
-
Managed the Denver office integration with Yahoo's
corporate network, completing the migration several months
ahead of Yahoo! projections. This included building out the
IDF room to spec concerning cooling and power, redundant
network stacks, cabling, and buildout/migration of all
development environments.
-
Co-managed the Yahoo Contributor Network site release in the
US, UK, and Brazil.
-
Worked closely with our development team to consistently
improve site and application performance as our contributor
base grew to over 500,000 and our publishing platform moved
towards internationalization, planning for growth accordingly.
Lead Systems Administrator
A2 Hosting
|
Jan 2008
- Jan 2010
(2 years)
Leverage previous cPanel and WHM experience to improve product offerings. Train support teams to provide better support and faster ticket resolution, leading to measurably increased customer satisfaction.
SRE
cPanel
Support
-
Responsible for maintaining the hardware and software of more
than 250 CentOS servers in a high-availability environment.
-
Through my redesign of the existing kickstart provisioning
build system, we achieved nearly zero-touch provisioning of
systems.
-
Implemented a new, fully redundant, multi-gigabit network
using 25-pair / 50-pin telco modules to eliminate
rack-confined switches, greatly simplifying network
architecture.
-
Deployed a network of serial console servers with improved
network boot options, reducing the need for hands-on software
repairs by 60%.
-
Deployed in-house iSCSI storage solution on Supermicro chassis
to migrate onboard storage of customer data to more robust and
secure SANs.
-
Responsible for fully training our systems administration team
focused on advanced troubleshooting techniques, system
maintenance tooling and procedures, and methods for providing
exceptional customer care.
Systems Administrator / Co-Owner
Obsidian Hosting Networks
|
Jan 2002
- Jan 2008
(6 years)
Offering high-performance multi-player Game Servers bundled with voice chat and web hosting services tailored to a variety of customer requirements and budgets.
SRE
cPanel
Support
-
Built, racked, cabled, and configured 50 Fedora Core 4
webservers.
-
Maintained remote and on-site hardware, software,
installation, and deployments of cPanel and WHM software.
-
Executed deployments of ModernBill and then WHMCS billing and
client areas.
- Configured DNS clustering with BIND.
-
Managed our Apache webserver deployment as one of the first
companies offering developer-friendly PHP 5 web hosting in
addition to multiple PHP-CGI options now utilized by dozens of
shared hosting vendors.
-
Oversaw our upgrade to PHP5, ensuring that the change went
smoothly for customers, and managed the manual migration of
over 500 websites (including email) from a non-cPanel
environment to our systems.
-
Deployed our internal network to handle monitoring, backups,
and remote administration.
-
Solely responsible for streamlining the customer sales and
support processes to ensure efficiency and an excellent user
experience.
Awards
Engineer of the Year
| Nov 2014
| GSP Award Committee
For providing unwavering dedication to meet customer goals and
exemplifying company standards for expertise, service, and
professionalism.
I Worked on-site with a very high-value client who wished to
validate OpenStack reliability on an upcoming generation of
hardware and rethink their internal cloud design. Together we
developed tools to automate containerized deployment (pre-
docker-openstack). I guided the team in deploying, managing,
monitoring, and benchmarking their redesigned private
OpenStack environment.
Cisco Honorary Guest Speaker at OpenStack Summit
| Apr 2013
| Cisco Cloud Team
Present experiences deploying OpenStack on Cisco UCS hardware
in concert with NetApp storage.
I was Invited by Cisco's OpenStack team to speak to their
engineers about my experiences, improvements, and pitfalls I
found through deploying OpenStack on Cisco UCS hardware. These
experiences were contributed to their official documentation.
Projects
home-lab
Highly Available compute cluster with distributed, tiered
storage dedicated to home automation, image processing, media
services, and education/R&D projects.
-
Converged, highly-available compute environment with 224 CPU
cores, 512GB RAM, (4x) Nvidia Quadro P1000 GPUs [SR-IOV], 10TB
(SAS3 SSD) Ceph Cluster [RBD: KVM/LXC Block Storage ]
-
180TB (SAS3 HDD) Ceph Cluster [CephFS: Shared Media and AutoFS
Home Directories]
-
Dedicated Machine-Learning, Image Processing, and
Home-Automation Environment
- 10gbit and 25gbit network
-
Running and maintaining my Ceph cluster has served as a great
learning platform, providing reliable and cost-effective
storage.
-
I apply tools and knowledge I've built to simplify and
automate my home network to learn for work and vice-versa.
-
I have automated metrics collection as a part of my build
pipeline to monitor systems and services.
-
Golden images are used for both KVM and LXC environments with
Ansible handling the final configuration.
Proxmox
Ceph
KVM
LXC
Docker
Ubiquiti
Ansible
Home Automation
IoT
Grafana/Loki
InfluxDB OSS2
Prometheus
Plex
Git/Github