Production hardening¶
While the tezos-on-gke project lets you quickly deploy a cluster with a running baker setup, running a production baker requires careful planning and execution.
Terraform service account¶
Usage of Google Default Application Credentials is not recommended in a production environment.
Instead:
- ensure that you have set up an Organization - that can be done by registering a domain name and adding it to gcloud
- create a Terraform Admin Project, Terraform Service Account and Service Account Credentials following this Google guide
- do not pass
project
as a variable when deploying the resources. Instead, passorganization_id
andbilling_account
as variables - pass the service account credentials json file
serviceAccount:terraform@${TF_ADMIN}.iam.gserviceaccount.com
asterraform_service_account_credentials
terraform variable
That will create the cluster in a new project, created by the terraform service account.
You may then grant people in your organization access to the project. It is recommended to write more terraform manifests to do so.
Separate cluster definition from baker definition¶
While you can create a baker in one-shot, it is best suited for demos and testnets. A production baker is best advised to be defined declaratively.
You would normally write all the parameters defining your baker in a terraform.tfvars
file in your laptop.
Instead, it is recommended to create a maintain a private terraform manifest, declaring a cluster, and every deployment that lives within. This way, the paramters defining your cluster can also be committed to git (except secrets which should be handled separately, more on this below).
This keeps the setup maintainable by letting you define several deployments within the same cluster. For example, you may deploy the Tezos baker setup, and the Tezos monitoring setup, within the same cluster.
Define the cluster¶
The terraform-gke-blockchain repository contains boilerplate terraform code to deploy a kubernetes cluster.
Start by declaring one empty cluster and one terraform provider:
module "terraform-gke-blockchain" {
source = "github.com/midl-dev/terraform-gke-blockchain?ref=v2.0"
org_id = "<my org id, defined above>"
billing_account = "<my billing account, defined above>"
project_prefix = "mybakingop"
monitoring_slack_url = var.monitoring_slack_url
terraform_service_account_credentials = "~/.config/gcloud/terraform-service-account-credentials.json"
node_pools = { "baking_pool" : { "node_count": 1, "instance_type": "e2-standard-2" },
"monitoring_pool" : { "node_count": 1, "instance_type": "e2-standard-1" } }
}
# This file contains all the interactions with Kubernetes
provider "kubernetes" {
host = module.terraform-gke-blockchain.kubernetes_endpoint
cluster_ca_certificate = module.terraform-gke-blockchain.cluster_ca_certificate
token = data.google_client_config.current.access_token
}
Notice that we created two node pools. These are distinct virtual machines that run your kubernetes cluster. You can map your pods to either. We will be using these to separate the baker setup from the payout/monitoring setup.
Define the tezos baker¶
Within the tezos-on-gke
repository, the terraform-no-cluster-create
folder will deploy the baker on a pre-existing cluster.
The output parameters of the terraform-gke-blockchain
module become the input parameters of the Tezos baker module.
All variables will appear in the terraform manifest itself, except secrets. Secrets should be kept as variables, and handled appropriately.
It looks like:
module "tezos-baker" {
source = "github.com/midl-dev/tezos-on-gke?ref=v2.0//terraform-no-cluster-create"
region = module.terraform-gke-blockchain.location
node_locations = module.terraform-gke-blockchain.node_locations
kubernetes_endpoint = module.terraform-gke-blockchain.kubernetes_endpoint
cluster_ca_certificate = module.terraform-gke-blockchain.cluster_ca_certificate
cluster_name = module.terraform-gke-blockchain.name
kubernetes_access_token = data.google_client_config.current.access_token
kubernetes_pool_name = "baking_pool"
project = module.terraform-gke-blockchain.project
full_snapshot_url = "https://mainnet.xtz-shots.io/full"
rolling_snapshot_url = "https://mainnet.xtz-shots.io/rolling"
kubernetes_namespace = "tezos"
kubernetes_name_prefix = "xtz"
tezos_version = "v9.2"
tezos_network = "mainnet"
baking_nodes = {
"mybaker" : {
"mynode" : {
"public_baking_key_hash": "tz1YmsrYxQFJo5nGj4MEaXMPdLrcRf2a5mAU",
"public_baking_key": "edpk...",
"insecure_private_baking_key": "edsk3cftTNcJnxb7ehCxYeCaKPT7mjycdMxgFisLixrQ9bZuTG2yZK"
}
}
}
}
With remote signer¶
It is recommended to use a remote signer for secure operations.
Below is an example of baker with remote signer configured:
module "tezos-baker" {
source = "github.com/midl-dev/tezos-on-gke?ref=v2.0//terraform-no-cluster-create"
region = module.terraform-gke-blockchain.location
node_locations = module.terraform-gke-blockchain.node_locations
kubernetes_endpoint = module.terraform-gke-blockchain.kubernetes_endpoint
cluster_ca_certificate = module.terraform-gke-blockchain.cluster_ca_certificate
cluster_name = module.terraform-gke-blockchain.name
kubernetes_access_token = data.google_client_config.current.access_token
kubernetes_pool_name = "baking_pool"
project = module.terraform-gke-blockchain.project
full_snapshot_url = "https://mainnet.xtz-shots.io/full"
rolling_snapshot_url = "https://mainnet.xtz-shots.io/rolling"
kubernetes_namespace = "tezos"
kubernetes_name_prefix = "xtz"
tezos_version = "v9.2"
tezos_network = "mainnet"
signer_target_host_key=var.signer_target_host_key
baking_nodes = {
"mybaker" : {
"mynode" : {
"public_baking_key_hash": "tz1YmsrYxQFJo5nGj4MEaXMPdLrcRf2a5mAU",
"public_baking_key": "edpk...",
"ledger_authorized_path": "ledger://my-four-key-words/ed25519/0h/1h",
authorized_signers : [
{ "ssh_pubkey" : "ssh-rsa AAAAB<snip>==",
"signer_port" : 8443,
"tunnel_endpoint_port" : 51756 }
]
}
}
}
}
With payout config¶
The baking_nodes
section also accepts a config for TRD payouts. See the TRD payouts section for details.
Terraform remote state¶
Terraform normally maintains state locally. Accidental loss of this file will cause your setup to be unmaintainable. Therefore, it is good practice to store the state in a remote Storage Bucket.
The best location for this storage bucket is the Terraform Admin project created above.
In the private terraform file, add the following:
terraform {
backend "gcs" {
bucket = "terraform-state-midl-prod"
prefix = "terraform/state"
}
}
The state will now be stored remotely.
More info in the Terraform documentation.
Putting it all together¶
This terraform manifest deploys a full Tezos baker.
It creates a cluster with two node pools: one for the baker and one for the remaining containers.
It deploys a baker and an auxiliary cluster handling the payouts, external monitoring and website.
terraform {
backend "gcs" {
bucket = "terraform-state-midl-prod"
prefix = "terraform/state"
}
}
data "google_client_config" "current" {
}
module "terraform-gke-blockchain" {
source = "github.com/midl-dev/terraform-gke-blockchain?ref=v1.0"
org_id = "<my org id, defined above>"
billing_account = "<my billing account, defined above>"
project_prefix = "mybakingop"
monitoring_slack_url = var.monitoring_slack_url
terraform_service_account_credentials = "~/.config/gcloud/terraform-service-account-credentials.json"
node_pools = { "baking_pool" : { "node_count": 1, "instance_type": "e2-standard-2" },
"monitoring_pool" : { "node_count": 1, "instance_type": "e2-standard-1" } }
}
module "tezos-baker" {
source = "github.com/midl-dev/tezos-on-gke?ref=v2.0//terraform-no-cluster-create"
region = module.terraform-gke-blockchain.location
node_locations = module.terraform-gke-blockchain.node_locations
kubernetes_endpoint = module.terraform-gke-blockchain.kubernetes_endpoint
cluster_ca_certificate = module.terraform-gke-blockchain.cluster_ca_certificate
cluster_name = module.terraform-gke-blockchain.name
kubernetes_access_token = data.google_client_config.current.access_token
project = module.terraform-gke-blockchain.project
kubernetes_pool_name = "baking_pool"
kubernetes_namespace = "tezos"
kubernetes_name_prefix = "xtz"
full_snapshot_url = "https://mainnet.xtz-shots.io/full"
rolling_snapshot_url = "https://mainnet.xtz-shots.io/rolling"
kubernetes_namespace = "tezos"
tezos_version = "v9.2"
tezos_network = "mainnet"
signer_target_host_key=var.signer_target_host_key
baking_nodes = {
"mynode" : {
"mybaker" : {
"public_baking_key_hash": "tz1YmsrYxQFJo5nGj4MEaXMPdLrcRf2a5mAU",
"public_baking_key": "edpk...",
"ledger_authorized_path": "ledger://my-four-key-words/ed25519/0h/1h",
authorized_signers : [
{ "ssh_pubkey" : "ssh-rsa AAAAB<snip>==",
"signer_port" : 8443,
"tunnel_endpoint_port" : 51756 }
]
"payout_config" = {
"schedule"="06 */3 * * *",
"initial_cycle": 370,
"release_override": -5,
"network": "MAINNET",
"reward_data_provider": "tzkt",
"dry_run": "false",
"payment_address": "tz1",
"rewards_type": "actual",
"service_fee": 5,
"rules_map": {}
}
}
}
}
}
module "tezos-mainnet-monitoring" {
source = "github.com/midl-dev/tezos-auxiliary-cluster?ref=v2.0//terraform-no-cluster-create"
region = module.terraform-gke-blockchain.location
kubernetes_endpoint = module.terraform-gke-blockchain.kubernetes_endpoint
cluster_ca_certificate = module.terraform-gke-blockchain.cluster_ca_certificate
cluster_name = module.terraform-gke-blockchain.name
kubernetes_access_token = data.google_client_config.current.access_token
project = module.terraform-gke-blockchain.project
kubernetes_pool_name = "monitoring_pool"
kubernetes_namespace = "tezos"
kubernetes_name_prefix = "xtz"
tezos_network = "mainnet"
tezos_version = "v9.2"
protocol = "009-PsFLoren"
protocol_short = "PsFLoren"
rolling_snapshot_url = "https://mainnet.xtz-shots.io/rolling"
bakers = {
"baker001": {
public_baking_key="tz1xxxx",
slack_url="https://hooks.slack.com/services/xxxxx",
slack_channel="#general",
}
}
}
Note that all values above are not secrets, it is fine to commit them in a private repository. The secrets are passed with variables and must be handled separately.
Going further¶
A production validator should be operated with an on-call rotation, meaning several operators have access to the setup.
Specifically:
- secrets should be moved from a file in the operator workspace to a production secret store such as Hashicorp Vault
- terraform deploys should be done by a CI system
- any manual change in the kubernetes environment should be recorded in an audit log and committed in the code:
- the terraform private file above can be applied with continuous integration
- the intermediate kubernetes code generated with kustomize could be stored in a CI pipeline and deployed in an auditable way as well (see Gitops).