Probe is a concept that is used for checking the “health” of a pod, i.e. whether the container inside is running or not. There are two kinds of Probes :
Liveness helps with checking if app running inside the Pod is healthy. If it is not, restarts the Pod. How to ensure if app is healthy or not ? Usually it is performed by checking port access. Here is an example of yaml containing livenessProbe. It works on schedule: initialDelaySeconds points the number of seconds before first check, periodSeconds parameter value is an interval between checks. Every time, when Liveness Probe fails, Pod restart occurs.
Readiness Probe behavior differs from Liveness Probe behavior – Readiness never causes restarting Pod, the Pod shows Not Ready status instead. Here is an example of yaml containing readinessProbe. It works the same as LivenessProbe.
If the Pod is in Not Ready status, then no Endpoints will be created when creating a Service for this Pod.
How does Scheduler (which is on Master) decide which Node to use for creating a Pod ? The criterias are :
- Hardware resources
- Memory/Disk available
- Pod requirement/ does the Pod ask for any Node
- Any Labels/any selector
- Ports, volumes
- Taints & Tolerances
Kubernetes administrator(s) or developer(s) can’t influence 1, 2 and 5. But can influence 3, 4, 6, 7.
nodeName parameter can be defined directly – see example here. If Kubernetes fails to find the Node with the given name, then the Pod will be in Pending status. When nodeName parameter is used, Scheduler doesn’t work, it is redundant. Using nodeName is not the same as static pods. Static pods are on the Node level, to delete them a yaml file must be taken off staticPodPath. Whereas pods created with nodeName parameter explicitly pointing to one of the Nodes can be administered on Master, through kubectl.
It is not considered a good practice to explicitly point a Node when creating pods. For example, if Node is unavailable, then pods will be in a Pending state, it reduces the flexibility. Using some features of the Node instead of directly using the name of the Node is a better approach. Applying labels to Nodes is a proper way.
kubectl label node <node name> <labelName=labelValue> – command that applies label to the Node
kubectl get nodes –show-labels – command that shows labels for nodes in a separate column
Here is an example of yaml containing nodeSelector parameter (it is seeking for a nodes with color=green label). If matching nodes are not found, pod(s) will be in a Pending status. In this case, Scheduler is running, unlike when nodeName is used.
Each Node has annotations. Node labels are managed by Kubernetes. But annotations are related only to Node and underlying CRI (usually Docker). Node annotations are third party to Kubernetes. Annotations itself are metadata.
Imagine that Pod is created on its matching Node, but later the label was deleted from the Node. It won’t be an issue for already running pods. But it will be an issue for later created pods. Node affinity gives additional flexibility to this default behavior. Node affinity is a more flexible way comparing with Node selector (like Replica Set is more flexible than Replication Controller). Here is the example of using Node affinity. Node affinity uses preferredDuringSchedulingIgnoredDuringExecution, requiredDuringSchedulingIgnoredDuringExecution, requiredDuringSchedulingRequiredDuringExecution parameters to define the severity of restrictions (preferred, required, ignored) applied to phases of Pod lifecycle (Scheduling, Execution).
All previous Node selections (nodeName, nodeSelector, Node affinity) were driven by pods, i.e. Pod was “looking for” a matching nodes. But sometimes nodes have their own restrictions. This is where Taints and Tolerations come into the picture. Here is an example of using tolerations in yaml script. Taint means severe restriction, Toleration allows exceptions in restriction policy (or it is better to say – allows entrance based on the condition). Taints are applied at Node level, Tolerations are applied at Pod level.
kubectl taint node <node name> <taint condition>:<action> – command that defines Taint condition and Taint action, which are applied to given node (for example, kubectl taint node node01 zone=red:NoSchedule , which means only pods that have label zone=red are allowed to “enter” node named node01, other won’t be scheduled for creation and running on node node01).
To untaint the Node, the same command should be followed by – (for example, kubectl taint node node01 zone=red:NoSchedule-). Taint cannot be applied at yaml, because yaml is applied at Pod level. Taint is applied at Node level. Using tolerance parameter is the only way to overcome Taint condition. Labeling Pod won’t do.
One of the actions applied to the pods that don’t match the taint condition is NoSchedule, which means not to create the Pod on the tainted node. Another possible action is NoExecute. Imagine there are pods already running on Node that is eventually tainted. If the action is NoExecute, those pods that do not match the condition, will be terminated and pulled out from the Node. If the condition is NoSchedule, pods will keep running, just a new ones won’t be onboarded.
There is a Taint applied to a Master node, that does not let scheduling pods on Master (it can be seen via kubectl describe node <master node name> | grep Taints). Master can be untainted by executing the command followed by -. It is not recommended, though, since it breaks “implied consent” of working with Kubernetes.
kubectl drain <node name> – command that restricts scheduling on the Node and tries to delete all the pods on the node (it is more severe than NoSchedule policy). By default it cannot delete the pods managed by Controllers (Replication Controller, Replica Set, Job, Daemon Set). But –force parameter overrides this rule by deleting all the pods
kubectl uncordon <node name> – command that revokes drain command affect (allows scheduling)
It is possible to create own Scheduler in Kubernetes. It is defined in spec section via parameter schedulerName.