Kubernetes Incident Response

Kubernetes Incident Response, How does it look like? Your ability to react quickly to a security incident can differentiate between a significant incident and a small one.

For sure, it can help minimize damage caused by some breaches. In many cases, the incident response process can play a significant role in the response alongside or part of the procedures, security controls, and human actions. 

The cloud and hybrid environment change the security world, especially PaaS, FaaS, and IaaS. The IRP (Incident Response Process) for the Cloud environment is well-known, but still, there are huge differences from the classic scenarios such as account compromised, malware, phishing, and others. That, to compare to the cloud. 

Kubernetes provides many valuable and advantages to the cloud, but like any other computing component, the Kubernetes can be exposed to security issues, gaps, misconfiguration, and security problems. 

TIP: A reliable alerting system that can warn you of suspicious behavior is the first step in a good incident response plan. 

Kubernetes Security Risks

Like any other component, Kubernetes have many security risks, some small risks and others that can be critical. Below are some of the Kubernetes security risks:

Compromised Images – To ensure the security of images, you should implement strong governance policies that ensure images are securely built and stored in trusted registries. For example, ensure that container images are created using pre-approved and secure base images, regularly scanned for issues and vulnerabilities.

Compromised Containers (Malicious Traffic) – Containers and pods need to communicate with each other as part of normal operations. However, this communication can be exploited by threat actors. A breached container can affect other containers and pods.

Visibility – Visibility is critical to ensure security is maintained. However, achieving visibility in complex, distributed, containerized environments can be challenging. There can be many containers scheduled, deployed, and terminated, and all of them need to be monitored and proactively hunted. A containerized workload’s distributed and dynamic nature makes it challenging to collect, track, and understand relevant metrics. Remember that Kubernetes can be deployed in multi-cloud or hybrid cloud environments.

Unsecured Configurations – Kubernetes is built to speed up the deployment of applications and simplify operations and management. While Kubernetes provides a wide range of controls that can help organizations effectively secure clusters and applications, it does not offer secure configurations out of the box.

Compliance Challenges – Achieving compliance in cloud-native environments is a highly challenging endeavor. To perform compliance, organizations are usually required to implement specific security measures. This often requires enforcing best practices, benchmarks, industry standards, and internal organizational policies.

Malicious Docker images – Public image repositories have become an easy tool for distributing malicious images disguised as custom configurations. An example that illustrates the reach of this attack vector is an incident with Docker hub, where a malicious image was pulled 5 million times before being removed.

Misconfigured Docker – If you’re an attacker looking for misconfigured Docker instances to exploit, it’s as easy as probing open ports 2375, 2376, 2377, 4243, and 4244 on the internet. Vulnerable instances can also be located using search engines like Shodan, Censys, and others.

Cluster misconfiguration – A considerable amount of trust is placed in cloud providers to configure, patch, and manage Kubernetes security on our behalf. Sometimes, however, things can fall through the cracks. Even the most highly trusted cloud services can be susceptible to configurations that unwittingly enable privilege escalation and facilitate the takeover of the cluster.

Vulnerabilities – another critical step in securing containers and Kubernetes is to prevent images or containers with known vulnerabilities from being deployed and identify, triage, and stop running containers with vulnerabilities. Image scanning provides one of the earliest opportunities to lower your security risk, especially when integrated early into a CI/CD pipeline.

Protecting against runtime threats – When you deploy containerized images into production Kubernetes clusters, they face new security issues and challenges along with external adversaries. The goal at runtime is to detect and respond to abnormal application behavior that might indicate a malicious actor is attempting to breach your environment via stolen credentials, Kubeconfig file, comprised images in the registry, application vulnerabilities, and others.

The image below defines the primary threat vectors for Kubernetes and the automated deployment system. 👇

Kubernetes Security Risks

The 4 C’s of Cloud-Native Security

Each Cloud Native security model layer builds upon the next outermost layer. The Code layer benefits from strong base (Cloud, Cluster, Container) security layers. You cannot safeguard against poor security standards in the base layers by addressing security at the Code level.

Cloud-native security classifies into four essential layers. It’s also known as the 4 C’s:

Code – Security measures that protect the code. For example, we are using vulnerability scanners and secure coding practices.

Containers – Security measures at the container level. For example, we are restricting access to network ports and encrypting data-in-transit.

Clusters – Security measures at the cluster level. For example, we define network security policies and harden all master nodes.

Cloud (infra) – Security measures that protect infrastructure. Usually, it deploys via cloud providers or event the on-premises IT staff.

The 4 C's of Cloud-Native Security

NIST Incident Response

Incident response is structured process organizations use to identify and deal with cybersecurity incidents. The response includes several stages: preparation for incidents, detection and analysis of a security incident, containment, eradication, full recovery, and post-incident research and learning.

NIST defines a process for incident response, illustrated in the diagram below. The NIST process emphasizes that incident response is not a linear activity that starts when an incident is detected and ends with eradication and recovery. Instead, incident response is a cyclical activity with continuous learning and improvement to defend the organization better.

After every incident, there is a substantial effort to document and investigate what happened, give feedback to earlier stages, and enable better preparation, detection, and analysis for future incidents.

There is also a feedback loop from the containment and eradication step to detection and analysis—many parts of an attack are not fully understood at the detection stage. They are only revealed when incident responders “enter the scene.” These lessons can help the team detect and analyze attacks more thoroughly the next time around.

The Kubernetes IR Story

When an incident does arise, you have to quickly decide whether to destroy and replace the affected container or isolate and inspect the container. There are many scenarios on responding to Kubernetes security incidents from the cluster, through the pods, etcd. 

Like any other incident response scenario, with Kubernetes, the principles of incident response are the same and based on people, processes, and technologies. But as I mentioned before, there are differences between cloud incident response and classic incident response. 

The attitude and perspective cannot be the same. If you made an incident response in the past, you must change it for the cloud! Attacks against cloud components such as Kubernetes are widespread. 

Identify the malicious Pod and worker node – Your first course of action should be to isolate the damage. Start by identifying where the breach occurred and separate that Pod and its node from the rest of the infrastructure. Identify the malicious Pods and worker nodes using workload name. If you know the name and namespace of the malicious Pod, you can identify the worker node running the Pod as follows:

kubectl get namespaces -o=jsonpath='{.spec.nodeName}{“\n”}’

If a Workload Resource such as deployment has been compromised, all the pods that are part of the workload resource are likely compromised. Use the following command to list all the pods of the Workload Resource and the nodes they are running on:

selector=$(kubectl get deployments <name> \ –namespace default -o json | jq -j \ ‘.spec.selector.matchLabels | to_entries | .[] | “\(.key)=\(.value)”‘)
kubectl get pods –namespace default –selector=$selector \ -o json | jq -r ‘.items[] | “\(.metadata.name) \(.spec.nodeName)”‘

Revoke assigned creds to the Pod or worker node – If the worker node has been given an IAM role that allows Pods to gain access to other Azure resources, remove those roles from the instance to prevent further damage attacks. Similarly, if the Pod has been assigned an IAM role, evaluate whether you can safely remove the IAM policies from the role without impacting other workloads.

Isolate the Pod – A deny-all traffic rule may help stop an already underway attack by severing all connections to the Pod. The following Network Policy will apply to a pod with the label app=web.

TIP: Network Policy can deny all ingress and egress traffic to the Pod.

Identify the malicious service account name – In some cases, you may identify that a service account is compromised. Pods using the specified service account are likely compromised. You can locate all the pods using the service account and nodes they are running on with the following command:

kubectl get pods -o json –namespace default  | \ jq -r ‘.items[] | select(.spec.serviceAccount == “test=ku”) | “\(.metadata.name) \(.spec.nodeName)”‘

Redeploy compromised Pod or Workload – Once you have gathered data for forensic analysis, you can redeploy the compromised Pod or workload resource. First, roll out the fix for the compromised vulnerability and start new replacement pods. Then delete the vulnerable pods.

If the vulnerable pods are managed by a higher-level Kubernetes workload resource (for example, a Deployment or DaemonSet), deleting them will schedule new ones. So soft pods will be relaunched. In that case, you should deploy a new replacement workload resource after fixing the vulnerability. Then you should delete the vulnerable workload.

Kubernetes IR Flow

Like any other incident response flow, you need to work with a dedicated incident response flow for Kubernetes. The incident response divide into the typical five phases of Detect, Analyze, Contain, Recover, and Post-Incident. The flow diagram below describes the Kubernetes incident response flow with all relevant steps and actions.

Note: this incident response flow is a general flow, and you can take it to the specific Kubernetes scenario. 

First, let’s start with detection that gives us the knowledge when we’ve got an incident, the standard attack, how it affects the organization, and then take us to triage and the next phase.Kubernetes IR Story

The analysis provides the second phase of the process and verifies the incident status, IOCs, the scope, and if some artifacts are affected, it takes us to the next stage.

Kubernetes IR Story

In the third phase, we need to contain the incident and take the relevant actions. Else we need to go back to the previous step or continue to the subsequent recovery phase and then to the Post-Incident.

The mitigations depend on the incident severity, and you take action after you detect an incident on a workload running. Few examples:

  • Snapshot the host VM’s disk.
  • Inspect the VM while the workload continues to run.
  • Pause the container for forensic capture.
  • Capture the operating system memory.
  • Dump of the processes running and the open ports
  • Run docker commands before evidence is altered.
  • Redeploy a container.
  • Delete a workload.

Kubernetes IR StoryKubernetes IR Story

IR Best Practices for Kubernetes

Human first – In the principles of People, Process, and Technology, the People are first. Ensure to provide the relevant skills to your people, be transparent with all the processes, and give responsibility to each team member.

Keep it Simple – First, have a well-defined process -with an incident response plan. Even if it is very well thought out, it must be simple and crystal clear to be effective. Keep human in loop, process details, procedures, and explanations to a minimum to ensure that staff can easily follow the plan in the urgency and confusion of an actual security incident.

Communication – Clarify who needs to be informed of a security breach, which communication channels should be used, and what level of detail should be provided. There should be clear guidelines on reporting operations, senior management, affected parties inside and outside the organization, law enforcement, and the press. This is a commonly overlooked part of the incident response process.

IR flow – Don’t reinvent the wheel. Always start your incident response plan from a template or a flow created by others and adapt it to your specific needs. For example, you can start from the flow above, including incident scope, logical sequence of events for incident response, notification, and escalation procedures.

Simulate IR – Conduct realistic drills and exercises to see how the incident response plan is carried out in practice and adapt the method according to lessons learned. Test your security tools to ensure they can detect an attack as early as possible in the kill chain and provide the team can identify a threat and contain it before sensitive data leaves your network.

Centralized Approach – Do not stream logs into multiple tools and correlate information between them during the urgency of an attack. Processes and tooling should support a centralized incident response process where an analyst can view all the information about an incident in one place.

IR Technology – Incident response tools/technology provide you with the means to eradicate discovered malicious presence and activity from your environment and optimize response workflows by automating repetitive tasks.


Overview of Cloud-Native Security | Kubernetes

Incident Response Archives – Elli Shlomo (eshlomo.us)

Azure-Sentinel-4-SecOps/IRP/Kubernetes at master · eshlomo1/Azure-Sentinel-4-SecOps (github.com)

Azure-Sentinel-4-SecOps/SECURITY INCIDENT INVESTIGATION REPORT.docx at master · eshlomo1/Azure-Sentinel-4-SecOps (github.com)

You may also like...

Leave a Reply

error: Content is Protected !!