How it works

MAAS has a tiered architecture with a central postgres database backing a ‘Region Controller (regiond)’ that deals with operator requests. Distributed Rack Controllers (rackd) provide high-bandwidth services to multiple racks. The controller itself is stateless and horizontally scalable, presenting only a REST API.

Rack Controller (rackd) provides DHCP, IPMI, PXE, TFTP and other local services. They cache large items like operating system install images at the rack level for performance but maintain no exclusive state other than credentials to talk to the controller.

MAAS structure diagram

High availability in MAAS

MAAS is a mission critical service, providing infrastructure coordination upon which HPC and cloud infrastructures depend. High availability in the region controller is achieved at the database level. The region controller will automatically switch gateways to ensure high availability of services to network segments in the event of a rackd failure.

Rackds are not in the primary data path, they are not routers or otherwise involved in the flow of data traffic, this diagram shows only the role that MAAS Rackds play in providing local services to racks, and the way in which they can cover for one another in the event of a failure.

MAAS can scale from a small set of servers to many racks of hardware in a datacentre. High-bandwidth activities (such as the initial operating system installation) are handled by the distributed gateways enabling massively parallel deployments.

Protocols

MAAS uses standard server BMC and NIC services such as IPMI and PXE to control the machines in your data centre. For converged infrastructure, MAAS talks to the chassis controller of the rack or hyperscale chassis such as Intel RSD, Cisco UCS or HP Moonshot. Custom plugins extend MAAS for alternative BMC protocols.

Initial machine inventory and commissioning is done from an ephemeral Ubuntu image that works across all major servers from all major vendors. It is possible to add custom scripts for firmware updates and reporting.

Physical availability zones

In keeping with the notion of a ‘physical cloud’ MAAS lets you designate machines as belonging to a particular availability zone. It is typical to group sets of machines by rack or room or building into an availability zone based on common points of failure. The natural boundaries of a zone depend largely on the scale of deployment and the design of physical interconnects in the data centre.

Nevertheless the effect is to be able to a scale-out service across multiple failure domains very easily, just as you would expect on a public cloud. Higher-level infrastructure offerings like OpenStack or Mesos can present that information to their API clients as well, enabling very straightforward deployment of sophisticated solutions from metal to container.

The MAAS API allows for discovery of the zones in the region. Chef, Puppet, Ansible and Juju can transparently spread services across the available zones.

Users can also specifically request machines in particular AZs.

There is no forced correlation between a machine location in a particular rack and the zone in which MAAS will present it, nor is there a forced correlation between network segment and rack. In larger deployments it is common for traffic to be routed between zones, in smaller deployments the switches are often trunked allowing subnets to span zones.

By convention, users are entitled to assume that all zones in a region are connected with very high bandwidth that is not metered, enabling them to use all zones equally and spread deployments across as many zones as they choose for availability purposes.

The node lifecycle

Each machine (“node”) managed by MAAS goes through a lifecycle — from its enlistment or onboarding to MAAS, through commissioning when we inventory and can setup firmware or other hardware-specific elements, then allocation to a user and deployment, and finally they are released back to the pool or retired altogether.


New

New machines which PXE-boot on a MAAS network will be enlisted automatically if MAAS can detect their BMC parameters. The easiest way to enlist standard IPMI servers is simply to PXE-boot them on the MAAS network.

Commissioning

Detailed inventory of RAM, CPU, disks, NICs and accelerators like GPUs itemised and usable as constraints for machine selection. It is possible to run your own scripts for site-specific tasks such as firmware updates.

Ready

A machine that is successfully commissioned is considered “Ready”. It will have configured BMC credentials (on IPMI based BMCs) for ongoing power control, ensuring that MAAS can start or stop the machine and allocate or (re)deploy it with a fresh operating system.

Allocated

Ready machines can be allocated to users, who can configure network interface bonding and addressing, and disks, such as LVM, RAID, bcache or partitioning.

Deploying

Users then can ask MAAS to turn the machine on and install a complete server operating system from scratch without any manual intervention, configuring network interfaces, disk partitions and more.

Releasing

When a user has finished with the machine, they can release it back to the shared pool of capacity. You can ask MAAS to ensure that there is a full disk-wipe of the machine when that happens.

MAAS is a bare-metal server provisioning tool. It is open source and free.