Infrastructure

Background

I run my Linux based services from home for more than 20y. Indeed, it makes no sense to run your “home” or “private” services in internet clouds, especially for your Wi-Fi, home automation, documents storage, or surveillance camera recordings. I also run the email and collaboration service Zimbra.

Benefits:

  • Low cost, less expensive than using equivalent managed services.
  • Freedom of administration, software & tools
  • Privacy
  • You have full control over your data

Drawbacks:

  • Regular updates are time-consuming
  • Systems failure or update issues generally occurs when you have no time to fix it
  • You must manage security, authentication and backups yourself

Solution

So I needed to address the drawbacks. We now have a complete set of tools that can address that, for free, and with the qualities of open source software. The goal is to have a highly available, scalable and hyper-converged infrastructure, with automation and redundancy to provide easy upgrades, with easy roll-back and minimum service disruption even in the event of an upgrade failure. And all of this for a TCO of €2500 on 5y, including power, software and remote encrypted cloud backup.

  • Full VM provides the best security and isolation. But VMs are also the most heavy to manage, and the least efficient in storage. The goal is to use them the least possible, only for services which needs low level access to the operating system or administrative privileges such as Firewalls, or a Docker cluster.
  • Docker containers are very light, easy to back up and manage. But are limited when data persistence is required such as for running a DB or a collaboration server. Typically, docker containers are made to run a couple of processes, and be destroyed & recreated at will or for every upgrade. The success of Docker is mostly due to the amazing docker hub store, providing millions of pre-build images or scripts of almost anything that can run in containers.
  • The comes the LXC containers. They provide a valuable intermediate solution, providing the same persistence and management of a VM, but with the lightness and efficiency of containers isolation. They are perfect to run databases and collaboration services such as Zimbra. Typically, LXC container pre-build images just concern most common Linux base operating systems.

Short description of the technical stack:

  • Network: 24GE+4SFP managed network switch used for VLANs and layer 3 only. Routing is handled by the firewalls. TP-Link €95 on Amazon
  • Storage & Compute Hardware: In order to run a high available cluster with data persistence you need a minimum of 3 nodes, and preferably an odd number of nodes, so you can lose up to ceil(n/2-1) nodes. So instead of having a high-end single server, it’s better to have several unreliable nodes (no RAID, single network interface). To ensure low TCO, and reliability I chose 3 fan-less servers with Intel i5-7200U, 256 GB SSD for OS, 1 TB secondary SSD for data, and 16 GB of RAM. In case of need, it’s easy to just add 2 nodes. (€450 Hystou Industrial Mini PC AliExpress)
  • Base Operating System: This is where Proxmox VE shines. It provides a Debian based OS, virtual machines and LXC container management, Ceph and CephFS cluster for data persistence, and basic backup capabilities.
  • Storage: Ceph cluster Software Defined Storage provides block devices for VMs and LXCs.
  • Cluster Filesystem: CephFS on top of Ceph provides a high available network filesystem for storing configurations, base images or Docker bind mounts for some persistence in Docker containers.
  • Docker Swarm: Docker is an easy and secure way to run docker hub images, and Swarm is a very simple and easy to maintain orchestrator. Swarm Mode provides workload allocation, container restart (HA), encrypted key-value store, network, central logging, and many other things.
  • Reverse Proxy: Traefik provides a complete solution to secure and manage connections to the cluster, both inside Docker Swarm, and for external services. Traefik
  • Highly Available Firewall with IP Load Balancer: Shorewall provides one of the most advanced IP filtering configuration solution, and Keepalived provides a unified solution to manage network availability using Linux Kernel IPVS, VRRP and services check.

Infrastructure blueprint

Limitations

While this solution provides a lot of flexibility there are some limitations inherent to the involved technologies :

  • Containers (LXC and Docker) cannot be live migrated.
  • Since Docker container persistence is based on bind mount on CephFS, Docker containers are limited by CephFS capabilities:
    • There is no inotify to monitor changes to the filesystem, and reports those changes to applications.
    • Ceph & CephFS are very well performing. But the performances are below those of a local filesystem, partly due to the network limitations (1 gigabit shared for now), and to the CPU limitations (A shared network filesystem on a clustered block device requires significant amount of CPU and IOPS).