A highly available Tezos baker on Kubernetes

Limitation with current setup

Version 1 of Tezos-on-GKE had two public (sentry) nodes and one baking node. This baker was configured as a Kubernetes Deployment, which means that Kubernetes would ensure that there is always one instance (and only one instance) of the baking node running at any point in time.

Old design

Enter the active-standby baking node

In the new model, we spin up two baking nodes as a StatefulSet and deploy a master election system to ensure only one bakes at a time.

  • if one node has an issue, switching over baking operations to the other one is much faster than in the cold standby case
  • automation is useful, but sometimes, you need to open a terminal into the baking node and perform manual operations (such as manual garbage collection). A switchover is an easy way to do maintenance without disrupting operations.
New design with active-standby baker

Switchover demo

The new active-standby mode can be activated with a terraform variable named experimental_active_standby_mode . Set it to true in your terraform.tfvars file, then deploy a baker following our documentation.

tze-tezos-private-baking-node-self-0 is the leader
2021-01-22 23:01:25,454 INFO supervisord started with pid 6
We are now the leader, starting endorser
2021-01-22 23:01:55,234 INFO spawned: 'tezos-endorser' with pid 40
2021-01-22 23:01:56,266 INFO success: tezos-endorser entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
tezos-endorser: started
$ kubectl delete pod tze-tezos-private-baking-node-self-1 -n tze
pod "tze-tezos-private-baking-node-self-1" deleted
We are now the leader, starting endorser
2021-01-26 03:59:43,273 INFO spawned: 'tezos-endorser' with pid 333796 2021-01-26 03:59:44,309 INFO success: tezos-endorser entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
tezos-endorser: startedNode is bootstrapped.
Endorser started.

Double baking protection

The single biggest risk in proof-of-stake infrastructure is equivocation, which means producing or endorsing two different blocks-essentially contradicting yourself. This is indistinguishable from malicious behavior and thus is punishable by slashing of funds. A misconfigured active-standby setup is a frequent cause for equivocation.

Load balancer acts as a single point of contact for both signers

Happy baking

Tezos-on-GKE v2.0 is available here. We strive to make institutional-grade staking infrastructure available to you for free.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store