Alan Janis

Scalable Infrastructure • Private Cloud • Distributed Storage • Metrics Collection • Log Aggregation • Data Ingest • Monitoring • Graphing • Alerting • Site Reliability • Full Stack Automation • Performance Tuning • R&D

Background

I think more people should build a private cloud with distributed & tiered storage, learn the tools to deploy it and collect the data to see into it. At home. For fun.

I do this just to support everyday home automation and entertainment - and because I love building cool things!

My career path began with Obsidian Hosting, a company I started to provide simple deployment of multiplayer game servers, which I bundled with voice chat and a variety of web hosting services. As I grew that infrastructure on a budget, I came to appreciate the open-source community. I repurposed old hardware for my home network and started looking for reasons to integrate any project that grabbed my attention. Not long after, I took a position with a local web hosting company for the opportunity to learn from an experienced team in a large-scale, high-uptime environment. In 2010, a senior role with Yahoo brought me to Colorado, where the thriving dev community has continued to shape my personal and professional relationships. In addition to the snow, rivers, and mountains, 'building cool things' remains at the top of my passions.

History

Senior Systems Engineer
Marriott | Mar 2023 - Sep 2023 (6 months)

Supporting CDN Team initiative to migrate Akamai rules to a 'configuration-as-code' standard.

Akamai
  • As part of the CDN (Akamai) Team, I was tasked with creating scripts and workflows to assist our team in supporting our internal customer needs.
  • Worked alongside our principal architect during early planning stages to codify CDN configuration and bring existing and future environments into sync and under revision control.
  • Rebuilt a suite of utilities used by dev/ops teams across the business to shape and validate web traffic across development and production environments.
  • Built new CDN purge utilities as part of a mentoring project with a student intern. Over a series of paired working calls, we created and packaged a CLI tool and an associated AWS Lambda function that can be easily used by individuals or as part of automation processes. While this was a relatively simple script, we discussed and implemented common and recommended tools for revision control (Git), dependency management (Pyenv, Poetry), package management (PyPI/Homebrew), and process automation (Ansible). The student presented the project to our team's management chain as a successful goal for the internship.
Consulting Solutions Architect
World Wide Technology | Jan 2016 - Mar 2023 (7 years 2 months)

Providing custom-tailored OpenStack and automation solutions for high-value clients.

OpenStack Ceph
  • Worked on-site with a very high-value client who wished to validate OpenStack reliability on an upcoming generation of hardware and rethink their internal cloud design. Together we developed tools to automate containerized deployment (pre- Docker-OpenStack) and I guided the team in testing, deploying, managing, and monitoring their OpenStack cloud. My unique, intuitive, and maintainable design prompted immediate praise from the client, for which I earned the WWT Global Service Provider - Engineer of The Year Award.
  • Developed an automated build pipeline using Ansible Tower and Foreman, which, given minimal specifications, allows our technicians to deploy the resources necessary to provision and test hundreds of systems each day; which are shipped to our customers fully racked and cabled for immediate integration.
  • Created an environment for testing and metrics collection for a prototype mesh storage solution. This utilized nVME drives and remote direct disk access (over ethernet) to provide ultra-low latency with ultra-high i/o operations.
  • Worked with the Principle Ceph Storage Engineer to better understand the management, deployment, and performance tuning of Ceph clusters.
  • Consulted the first cross-divisional project team in building an automated solution to verify and remediate the hardware configuration, health status, firmware, and operating system version. Provided guidance related to setting up a PXE boot environment for hardware and system configuration, automated firmware remediation, and reporting.
Director of Technology
MassRoots | Jan 2015 - Jan 2016 (1 year)

Guide the creation of new API, services, and ETL processes to migrate application data from obsolete backend with minimal customer impact.

DevOps
  • Responsible for the redesign of the infrastructure to run the mobile application, revamping the entire backend infrastructure to a hybrid platform on AWS, OpenStack, and KVM+Vagrant. My team was responsible for developing API services using Node.js, Express.js, Redis, Percona XtradDB Cluster, and Nginx to facilitate migration from Facebook's Parse Mobile App Platform.
  • Built out monitoring and alerting systems with Icinga, ELK stack, and TIG (Telegraf, InfluxDB, Grafana) stack.
  • Deployed Salt for configuration, control, self-healing, and rapid deployment/scale of systems and services in a hybrid cloud environment.
  • Technologies used for infrastructure include Percona XtraDB Cluster, Salt, Icinga, Redis, Nginx, HAProxy, Node.js, ElasticSearch, Jenkins, KVM, Vagrant, Amazon Web Services (EC2, S3 Primarily), LDAP, PFSense, OpenVPN.
Principal OpenStack Engineer
Photobucket | Jan 2012 - Jan 2015 (3 years)

Architect and develop automation for OpenStack, deployed on Cisco UCS and NetApp hardare, and oversee the migration of all bare-metal services into new private cloud.

OpenStack NetApp DevOps
  • Worked with a small high-output team responsible for maintaining site reliability while also transitioning all significant services to an internal OpenStack cloud.
  • Principal engineer of private cloud environment using OpenStack (Grizzly, Icehouse) on Ubuntu (12.04, 14.04 respectively) built on Cisco UCS B-Series blade hardware, and backed by NetApp clustered NFS storage.
  • Shared responsibility for 22 Petabytes of storage, spread over 72 NetApp appliances (FAS6040, FAS3240, FAS3210, FAS2240, FAS2020) running OnTap 7.1.1 - 8.1.1 in 7-Mode and Clustered-Mode.
  • Developed custom storage balancing tools required to move roughly 6 Petabytes of data out of NetApp TradVols and into 64-bit FlexVols. Managed rebalancing data across the existing 32-bit FlexVols to begin the upgrade process to 64-bit FlexVols essential to the release and ramp-up of the site redesign, given the data-ingest projections.
  • Headed a storage initiative to compare flash-based PureStorage against NetApp's latest flash-based storage as the backend for our newly virtualized databases. Technologies used in this project allowed me to learn about fiber-channel switching/zoning configuration on Cisco UCS 6200 Fabric-Interconnects, Linux FCoE, and multipathing, as well as configuring the KVM virtual machines to boot from SAN.
Senior Systems Engineer
Yahoo | Jan 2010 - Jan 2012 (2 years)

Provide design expertise to modernize the application stack while improving performance and reliability.

SRE DevOps
  • Managed the deployment of the Yahoo! Contributor Network including network and hardware specifications, system provisioning, QA process, and launch.
  • Provided support to our in-house development team to leverage new Yahoo! technology including their cloud computing platform which now handles all Associated Content analytics and content (content agility) to feed the content across Y!'s entire collection of properties.
  • Responsible for 100% availability of Associated Content and Yahoo! Contributor Network consisting of 340 RHEL5 servers.
  • Managed Yahoo Voices launch and affected changes to improve scalability and automation.
  • Architected parallel code deployment system for dev, stage, QA, and production environments using Git. This resulted in full deploys in under 90 seconds and featured tiered-deploy and full rollback-to-stable capabilities.
  • Managed the Denver office integration with Yahoo's corporate network, completing the migration several months ahead of Yahoo! projections. This included building out the IDF room to spec concerning cooling and power, redundant network stacks, cabling, and buildout/migration of all development environments.
  • Co-managed the Yahoo Contributor Network site release in the US, UK, and Brazil.
  • Worked closely with our development team to consistently improve site and application performance as our contributor base grew to over 500,000 and our publishing platform moved towards internationalization, planning for growth accordingly.
Lead Systems Administrator
A2 Hosting | Jan 2008 - Jan 2010 (2 years)

Leverage previous cPanel and WHM experience to improve product offerings. Train support teams to provide better support and faster ticket resolution, leading to measurably increased customer satisfaction.

SRE cPanel Support
  • Responsible for maintaining the hardware and software of more than 250 CentOS servers in a high-availability environment.
  • Through my redesign of the existing kickstart provisioning build system, we achieved nearly zero-touch provisioning of systems.
  • Implemented a new, fully redundant, multi-gigabit network using 25-pair / 50-pin telco modules to eliminate rack-confined switches, greatly simplifying network architecture.
  • Deployed a network of serial console servers with improved network boot options, reducing the need for hands-on software repairs by 60%.
  • Deployed in-house iSCSI storage solution on Supermicro chassis to migrate onboard storage of customer data to more robust and secure SANs.
  • Responsible for fully training our systems administration team focused on advanced troubleshooting techniques, system maintenance tooling and procedures, and methods for providing exceptional customer care.
Systems Administrator / Co-Owner
Obsidian Hosting Networks | Jan 2002 - Jan 2008 (6 years)

Offering high-performance multi-player Game Servers bundled with voice chat and web hosting services tailored to a variety of customer requirements and budgets.

SRE cPanel Support
  • Built, racked, cabled, and configured 50 Fedora Core 4 webservers.
  • Maintained remote and on-site hardware, software, installation, and deployments of cPanel and WHM software.
  • Executed deployments of ModernBill and then WHMCS billing and client areas.
  • Configured DNS clustering with BIND.
  • Managed our Apache webserver deployment as one of the first companies offering developer-friendly PHP 5 web hosting in addition to multiple PHP-CGI options now utilized by dozens of shared hosting vendors.
  • Oversaw our upgrade to PHP5, ensuring that the change went smoothly for customers, and managed the manual migration of over 500 websites (including email) from a non-cPanel environment to our systems.
  • Deployed our internal network to handle monitoring, backups, and remote administration.
  • Solely responsible for streamlining the customer sales and support processes to ensure efficiency and an excellent user experience.

Awards

Engineer of the Year | Nov 2014 | GSP Award Committee

For providing unwavering dedication to meet customer goals and exemplifying company standards for expertise, service, and professionalism.

I Worked on-site with a very high-value client who wished to validate OpenStack reliability on an upcoming generation of hardware and rethink their internal cloud design. Together we developed tools to automate containerized deployment (pre- docker-openstack). I guided the team in deploying, managing, monitoring, and benchmarking their redesigned private OpenStack environment.

Cisco Honorary Guest Speaker at OpenStack Summit | Apr 2013 | Cisco Cloud Team

Present experiences deploying OpenStack on Cisco UCS hardware in concert with NetApp storage.

I was Invited by Cisco's OpenStack team to speak to their engineers about my experiences, improvements, and pitfalls I found through deploying OpenStack on Cisco UCS hardware. These experiences were contributed to their official documentation.

Projects

home-lab

Highly Available compute cluster with distributed, tiered storage dedicated to home automation, image processing, media services, and education/R&D projects.

  • Converged, highly-available compute environment with 224 CPU cores, 512GB RAM, (4x) Nvidia Quadro P1000 GPUs [SR-IOV], 10TB (SAS3 SSD) Ceph Cluster [RBD: KVM/LXC Block Storage ]
  • 180TB (SAS3 HDD) Ceph Cluster [CephFS: Shared Media and AutoFS Home Directories]
  • Dedicated Machine-Learning, Image Processing, and Home-Automation Environment
  • 10gbit and 25gbit network
  • Running and maintaining my Ceph cluster has served as a great learning platform, providing reliable and cost-effective storage.
  • I apply tools and knowledge I've built to simplify and automate my home network to learn for work and vice-versa.
  • I have automated metrics collection as a part of my build pipeline to monitor systems and services.
  • Golden images are used for both KVM and LXC environments with Ansible handling the final configuration.
Proxmox Ceph KVM LXC Docker Ubiquiti Ansible Home Automation IoT Grafana/Loki InfluxDB OSS2 Prometheus Plex Git/Github