Update
When I went through this initially, I didn't realise I was setting up a high-availability cluster meant just to run Rancher... which no-one needs. So instead I checked out the guide on running a single node setup, and provisioned my Kubernetes cluster through the Rancher CLI. The way it was intended.
On the off-chance someone's interested in how I setup a high-availability cluster running Rancher, here goes.
I don’t know enough about Kubernetes or Rancher to create a guide, but here’s how I set them up so I could have a cluster of machines that can run Docker containers, for a growing project.
If you follow this and you come up against problems, check the guides listed in the Credits section, as I’m probably not knowledgeable enough to help.
Background
In March I wrote about Project Anvil, an attempt to create a Dockerised version of my web product. I've gone back and forth and round the houses with this project, and have come to the conclusion that the best thing to do is to keep the app as close to the current production version as possible. I've been making big changes to other parts of the system, and even though I'm still splitting a few things out into separate repos – which I think is the best course – I don't want to go the full SPA route.
So the first aim is to get the app to be deployed on a stack running Kubernetes (a system I don't really understand) via Rancher (an open source manager I've never used).
As I said up top, this is definitely not a guide, but a rundown of the steps I've taken so far that have lead to a seemingly stable Rancher installation.
Provisioning the nodes and load balancer
I started by creating 4 DigitalOcean droplets, using the following naming scheme:
universe-node0
universe-node1
universe-node2
universe-node3
Then I SSHed into universe-node0
and ran the following to install NginX:
$ apt-get update && apt-get -y install nginx
I then SSHed into the remaining nodes and ran the following to install Docker:
$ apt-get remove -y docker docker-engine docker.io
$ apt-get update && apt-get install apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$ apt-get update && apt-get install -y docker-ce=17.03.2~ce-0~ubuntu-xenial
Configuring NginX
On universe-node0
, I replaced /etc/nginx/nginx.conf
with the following:
worker_processes 4;
worker_rlimit_nofile 40000;
events {
worker_connections 8192;
}
http {
server {
listen 80;
return 301 https://$host$request_uri;
}
}
stream {
upstream rancher_servers {
least_conn;
server <node1 ip>:443 max_fails=3 fail_timeout=5s;
server <node2 ip>:443 max_fails=3 fail_timeout=5s;
server <node3 ip>:443 max_fails=3 fail_timeout=5s;
}
server {
listen 443;
proxy_pass rancher_servers;
}
}
I restarted NginX by running service nginx restart
.
Setting up the FQDN
I added a DNS record pointing the IP address of the load balancer node to rancher.example.com (obviously I've redacted the domain name).
Installing RKE
I download the Rancher Kubernetes installer, and in a terminal, entered the directory the binary was downloaded to and ran chmod +x rke_darwin-amd64
to make the binary executable. I verified RKE was working by running ./rke_darwin-amd64 --version
. I then made it executable from any location by running mv rke_darwin-amd64 /usr/local/bin/rke
.
Configuring Rancher
I created a file called rancher-cluster.yml
, using the following template:
nodes:
- address: <node1 ip>
user: root
role: [controlplane,etcd,worker]
ssh_key_path: <node1 ssh key path>
- address: <node2 ip>
user: root
role: [controlplane,etcd,worker]
ssh_key_path: <node1 ssh key path>
- address: <node3 ip>
user: root
role: [controlplane,etcd,worker]
ssh_key_path: <node1 ssh key path>
addons: |-
---
kind: Namespace
apiVersion: v1
metadata:
name: cattle-system
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: cattle-admin
namespace: cattle-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cattle-crb
namespace: cattle-system
subjects:
- kind: ServiceAccount
name: cattle-admin
namespace: cattle-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Secret
metadata:
name: cattle-keys-ingress
namespace: cattle-system
type: Opaque
data:
tls.crt: <tls crt>
tls.key: <tls key>
---
apiVersion: v1
kind: Service
metadata:
namespace: cattle-system
name: cattle-service
labels:
app: cattle
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
- port: 443
targetPort: 443
protocol: TCP
name: https
selector:
app: cattle
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: cattle-system
name: cattle-ingress-http
annotations:
nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
spec:
rules:
- host: rancher.example.com
http:
paths:
- backend:
serviceName: cattle-service
servicePort: 80
tls:
- secretName: cattle-keys-ingress
hosts:
- rancher.example.com
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
namespace: cattle-system
name: cattle
spec:
replicas: 1
template:
metadata:
labels:
app: cattle
spec:
serviceAccountName: cattle-admin
containers:
- image: rancher/rancher:latest
args:
- --no-cacerts
imagePullPolicy: Always
name: cattle-server
ports:
- containerPort: 80
protocol: TCP
- containerPort: 443
protocol: TCP
Getting an SSL certificate for the Rancher instance
I SSHed into the land balancer and ran the following to install the Let’s Encrypt certbot, so I could obtain an SSL certificate for my Rancher installation:
$ apt-get update && apt-get install -y software-properties-common && add-apt-repository ppa:certbot/certbot
$ apt-get update && apt-get install -y python-certbot-nginx
I stopped NginX from running so I could spin up a temporary webserver:
$ service nginx stop
I then ran certbot certonly
and when prompted, selected option 2 to spin up that temporary webserver.
I was then able to start the NginX server back up:
$ service nginx start
Once the certificate was registered, I had two files:
/etc/letsencrypt/live/rancher.example.com/fullchain.pem
/etc/letsencrypt/live/rancher.example.com/privkey.pem
I base64-encoded the certificate:
$ echo $(base64 /etc/letsencrypt/live/rancher.example.com/fullchain.pem)
I then pasted the value into the tls.crt
key (line 49) of the rancher-cluster.yml
file, remembering to remove any whitespace and line-breaks so it was pasted as a single, unbroken string (not a YAML multi-line string).
I base64-encoded the certificate key:
$ echo $(base64 /etc/letsencrypt/live/rancher.example.com/privkey.pem)
I then pasted that value into the tls.key
value (line 50) of the rancher-cluster.yml
file, in the same manner as before.
I ran the following to provision the nodes:
$ rke up —config <path to rancher-cluster.yml>
After a while I ended up with an output that ended like this:
INFO[0021] [addons] Executing deploy job..
INFO[0031] [addons] User addons deployed successfully
INFO[0031] Finished building Kubernetes cluster successfully
It failed the first few times for me (succeeded with warnings), because I’d stuffed up the certificate strings.
Once setup, I was able to pop my domain name into my browser and see my Rancher instance, and setup my password.