👉 Scaling Containerized Workloads on AWS EKS Using Real-Time Metrics

As businesses increasingly adopt microservices architecture, managing containerized workloads becomes critical. Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) is a popular choice for orchestrating these workloads. However, to ensure optimal performance and cost-efficiency, scaling based on real-time metrics is essential. In this blog post, we'll dive into what it means to scale containerized workloads on AWS EKS based on real-time metrics and provide a comprehensive guide on achieving this.

Defining Key Terminologies

What is mean by Containerized Workloads?

Containerized workloads refer to applications and services that are encapsulated in containers. Containers are lightweight, standalone, and executable software packages that include everything needed to run a piece of software, including the code, runtime, libraries, and system tools. Docker is a widely-used platform for containerization.

What is AWS EKS?

Amazon Elastic Kubernetes Service (EKS) is a managed service that simplifies running Kubernetes on AWS without needing to install and operate Kubernetes control plane or nodes. Kubernetes, often abbreviated as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.

What are Real-Time Metrics?

Real-time metrics are continuous streams of data that provide instantaneous insight into the performance and health of systems. In the context of EKS, these metrics could include CPU utilization, memory usage, request rates, and response times, among others.

What is mean by Scaling in Cloud Computing?

Scaling refers to adjusting the number of running instances of an application based on the workload. In Kubernetes, this can be achieved both vertically (adjusting resource limits for a pod) and horizontally (adding or removing pod instances).

Why Scaling Containerized Workloads Matters

Scaling containerized workloads is crucial for several reasons, primarily revolving around performance, cost-efficiency, and reliability. Let’s explore these aspects in detail:

Performance Optimization

Responsive Applications: By scaling containerized workloads, applications can handle varying loads without performance degradation. This ensures that user experience remains consistent, even during peak traffic periods.
Resource Utilization: Scaling allows for the optimal use of resources, ensuring that applications have enough CPU and memory to perform efficiently. This is particularly important for resource-intensive applications that need to scale up during high demand and scale down during low demand to free up resources.

Cost Efficiency

Pay-as-You-Go: Cloud platforms like AWS EKS offer a pay-as-you-go model. By scaling workloads dynamically, organizations only pay for the resources they actually use, avoiding the costs associated with over-provisioning.
Auto Scaling: Features like the Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler on AWS EKS automate the scaling process based on real-time metrics, further optimizing costs by adjusting resource allocation precisely when needed.

Reliability and Availability

Fault Tolerance: Scaling helps in maintaining high availability and fault tolerance. If one pod fails, others can be scaled up to take over the load, ensuring continuous service availability.
Load Balancing: Properly scaled workloads distribute the load evenly across multiple containers and nodes, reducing the risk of any single point of failure.

Adaptability and Flexibility

Dynamic Environments: Modern applications often experience unpredictable workloads. Scaling allows these applications to adapt dynamically to changing demands, ensuring they remain robust and responsive.
DevOps and CI/CD: In continuous integration and continuous deployment (CI/CD) environments, scaling supports rapid development and deployment cycles by ensuring that testing and staging environments can scale up or down based on the needs of the development pipeline.

Security and Compliance

Isolated Environments: Scaling containerized workloads in isolated environments can help in meeting compliance requirements by ensuring that workloads are segregated based on security needs.
Resource Quotas: By scaling, organizations can enforce resource quotas and limits, preventing any single workload from monopolizing system resources and potentially leading to security vulnerabilities.

Scaling containerized workloads is essential for maintaining optimal performance, ensuring cost efficiency, enhancing reliability, and providing the adaptability required in modern dynamic environments. Effective scaling strategies on platforms like AWS EKS enable businesses to leverage the full potential of containerized applications, leading to better user experiences and operational efficiencies.

Scaling Strategies in AWS EKS

Amazon Elastic Kubernetes Service (EKS) provides several robust scaling strategies to efficiently manage and optimize your containerized workloads. Understanding these strategies is essential for maintaining application performance, cost efficiency, and reliability.

1. Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in a Kubernetes cluster based on observed CPU utilization or other select metrics.

Metrics-Based Scaling: HPA monitors resource metrics, such as CPU and memory usage, to determine the need for scaling.
Custom Metrics: It can also be configured to use custom metrics from services like AWS CloudWatch.

Example Configuration:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-example
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

2. Cluster Autoscaler

The Cluster Autoscaler automatically adjusts the size of the Kubernetes cluster by adding or removing EC2 instances (nodes) based on the pending pods that cannot be scheduled due to resource constraints.

Node Group Scaling: It scales the node groups up or down to meet the demands of the pods.
Efficient Resource Use: Ensures optimal resource utilization by only adding nodes when necessary and removing them when they are no longer needed.

3. AWS Fargate

AWS Fargate is a serverless compute engine for containers that works with EKS, allowing you to run containers without managing the underlying EC2 instances.

Automatic Scaling: Fargate scales the number of tasks running your containers based on the workload automatically.
Simplified Management: Eliminates the need for managing infrastructure, making it a good choice for dynamic and unpredictable workloads.

4. AWS Auto Scaling Groups

Auto Scaling Groups (ASGs) in AWS allow you to automatically adjust the number of EC2 instances in your cluster.

Scaling Policies: These can be based on various metrics like CPU usage, network traffic, or even custom metrics.
Scheduled Scaling: Allows predefined scaling actions to meet anticipated demands (e.g., scale out at peak business hours).

Example Policy:

{
    "AutoScalingGroupName": "my-asg",
    "PolicyName": "scale-out-policy",
    "AdjustmentType": "ChangeInCapacity",
    "ScalingAdjustment": 1,
    "Cooldown": 300
}

5. Right-Sizing Workloads

Right-sizing involves optimizing the resource requests and limits for your pods to ensure they have enough resources to run efficiently without over-provisioning.

Resource Requests and Limits: Define how much CPU and memory each pod should request and limit to.
Monitoring and Adjusting: Continuously monitor and adjust these settings based on performance metrics.

6. Custom Metrics and Alarms

Leverage AWS CloudWatch to set up custom metrics and alarms to trigger scaling actions based on specific application needs.

Application-Specific Metrics: Such as request count per second, latency, etc.
Automated Actions: Automatically trigger scaling actions when certain thresholds are met.

Scaling strategies in AWS EKS enable efficient management of containerized workloads by ensuring optimal performance, cost efficiency, and reliability. By leveraging tools like HPA, Cluster Autoscaler, Fargate, ASGs, and custom metrics, organizations can dynamically adjust their resources to meet real-time demands, ultimately enhancing application performance and user satisfaction.

Step-by-Step Guide to Scaling on AWS EKS

Scaling containerized workloads on Amazon Elastic Kubernetes Service (EKS) ensures your applications run efficiently, can handle increased loads, and remain cost-effective. This guide will walk you through the essential steps for implementing scaling strategies on AWS EKS, emphasizing automatic scaling mechanisms like Horizontal Pod Autoscaler (HPA), Cluster Autoscaler, and AWS Fargate.

Step 1: Preparing Your EKS Cluster

Create an EKS Cluster:

eksctl create cluster --name my-cluster --region us-west-2 --nodegroup-name linux-nodes --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --managed

This command creates an EKS cluster with a managed node group.

Install Metrics Server:

Metrics Server is required for HPA to function. Install it using the following commands:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Step 2: Implementing Horizontal Pod Autoscaler

Deploy an Application:

Deploy a sample application, for example, an NGINX deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "250m"
            memory: "256Mi"

Apply the deployment:

kubectl apply -f nginx-deployment.yaml

Create HPA:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Apply the HPA:

kubectl apply -f nginx-hpa.yaml

Step 3: Setting Up Cluster Autoscaler

Install Cluster Autoscaler:

Follow the AWS documentation to install the Cluster Autoscaler with IAM roles and permissions.

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/cluster-autoscaler-chart/cluster-autoscaler-chart.yaml

Configure Cluster Autoscaler:

Modify the deployment to include your cluster name and the AWS region.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
      - image: k8s.gcr.io/autoscaler/cluster-autoscaler:v1.21.2
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
          - ./cluster-autoscaler
          - --v=4
          - --stderrthreshold=info
          - --cloud-provider=aws
          - --skip-nodes-with-local-storage=false
          - --expander=least-waste
          - --nodes=1:10:my-cluster-ng-a2e2db7f.k8s.local

Apply the configuration:

kubectl apply -f cluster-autoscaler.yaml

Step 4: Utilizing AWS Fargate

Create a Fargate Profile:

eksctl create fargateprofile --cluster my-cluster --name my-fargate-profile --namespace fargate

This command creates a Fargate profile that specifies which pods should run on Fargate.

Deploy a Pod on Fargate:

apiVersion: v1
kind: Pod
metadata:
  name: fargate-pod
  namespace: fargate
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80

Apply the pod configuration:

kubectl apply -f fargate-pod.yaml

Step 5: Autoscaling with Custom Metrics using Prometheus and CloudWatch

Set Up Prometheus:

Deploy Prometheus in your EKS cluster to collect custom metrics.

kubectl apply -f https://github.com/prometheus-operator/prometheus-operator/raw/main/bundle.yaml

Configure CloudWatch Container Insights:

Install and configure CloudWatch Container Insights to send custom metrics to CloudWatch.

kubectl apply -f https://amazon-eks.s3.us-west-2.amazonaws.com/docs/eks-logging-quickstart.yaml

Create a Custom Metric:

Define and collect a custom metric using Prometheus. For example, create a custom metric for request count:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-servicemonitor
spec:
  selector:
    matchLabels:
      app: example
  endpoints:
  - port: web
    path: /metrics

Apply the ServiceMonitor:

kubectl apply -f example-servicemonitor.yaml

Set Up HPA with Custom Metrics:

Create an HPA that uses custom metrics from Prometheus.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: custom_metric
      target:
        type: AverageValue
        averageValue: 100

Apply the HPA:

kubectl apply -f custom-metric-hpa.yaml

Step 6: Using KEDA for Event-Driven Scaling

Deploy KEDA:

KEDA (Kubernetes Event-Driven Autoscaling) allows scaling based on event sources. Install KEDA in your EKS cluster.

kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.4.0/keda-2.4.0.yaml

Configure KEDA ScaledObject:

Create a ScaledObject that defines the scaling behavior based on event sources.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-scaledobject
spec:
  scaleTargetRef:
    name: nginx-deployment
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: 
      queueLength: "5"

Apply the ScaledObject:

kubectl apply -f queue-scaledobject.yaml

Step 7: Implementing Karpenter for Efficient Node Management

Install Karpenter:

Karpenter is a Kubernetes cluster autoscaler built to work with EKS for better node provisioning.

helm repo add karpenter https://charts.karpenter.sh
helm repo update
helm install karpenter karpenter/karpenter --namespace karpenter --create-namespace

helm repo add karpenter https://charts.karpenter.sh helm repo update helm install karpenter karpenter/karpenter --namespace karpenter --create-namespace

Configure Provisioner:

Define a provisioner for Karpenter that specifies how to scale nodes.

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  cluster:
    name: my-cluster
  constraints:
    labels:
      purpose: spot
    requirements:
      - key: "karpenter.k8s.aws/capacity-type"
        operator: In
        values: ["spot"]
      - key: "topology.kubernetes.io/zone"
        operator: In
        values: ["us-west-2a", "us-west-2b"]
  limits:
    resources:
      cpu: "1000"
      memory: "4000Gi"

Apply the provisioner:

kubectl apply -f karpenter-provisioner.yaml

Step 8: Testing and Monitoring

Load Testing:

Perform load testing on your application to ensure that your scaling configurations work as expected. Tools like Apache JMeter or k6 can be useful.

Monitor Scaling Events:

Use CloudWatch dashboards to monitor scaling events, resource usage, and custom metrics. Ensure that the scaling is occurring as intended and make adjustments as needed.

Step 9: Leveraging Spot Instances for Cost-Effective Scaling

Integrate Spot Instances:

Spot Instances allow you to utilize spare AWS compute capacity at a reduced cost. Configure your EKS cluster to use Spot Instances for non-critical workloads.

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: spot-instance-provisioner
spec:
  requirements:
    - key: "kubernetes.io/instance-type"
      operator: In
      values: ["m5.large", "m5a.large"]
    - key: "karpenter.k8s.aws/capacity-type"
      operator: In
      values: ["spot"]
  limits:
    resources:
      cpu: "500"
      memory: "1000Gi"

Apply the provisioner:

kubectl apply -f spot-instance-provisioner.yaml

Monitor Spot Instance Usage:

Ensure you monitor the usage and performance of Spot Instances. Utilize AWS CloudWatch and other monitoring tools to keep track of any interruptions and cost savings.

Step 10: Implementing Multi-Region and Multi-AZ Scaling

Set Up Multi-AZ Clusters:

Ensure your EKS cluster spans multiple Availability Zones (AZs) for high availability and fault tolerance. Configure your EKS cluster to deploy nodes across multiple AZs.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
  region: us-west-2
availabilityZones: ["us-west-2a", "us-west-2b"]
nodeGroups:
  - name: ng-1
    instanceType: m5.large
    desiredCapacity: 2

Apply the cluster configuration using eksctl:

eksctl create cluster -f multi-az-cluster-config.yaml

Multi-Region Deployment:

Set up a secondary EKS cluster in another AWS region to ensure disaster recovery and business continuity. Use AWS Global Accelerator or Route 53 for traffic routing.

Step 11: Advanced Monitoring and Logging

Integrate Prometheus and Grafana:

Use Prometheus for monitoring and Grafana for visualization of metrics.

kubectl apply -f https://github.com/prometheus-operator/prometheus-operator/raw/main/bundle.yaml
helm repo add grafana https://grafana.github.io/helm-charts
helm install grafana grafana/grafana

Set Up CloudWatch Logs and Metrics:

Ensure that your EKS cluster logs and metrics are being sent to AWS CloudWatch for centralized logging and monitoring.

kubectl apply -f https://amazon-eks.s3.us-west-2.amazonaws.com/docs/eks-logging-quickstart.yaml

Step 12: Scaling Stateful Workloads

Use StatefulSets for Stateful Applications:

Deploy applications that require stable storage using StatefulSets. Ensure that your Persistent Volume Claims (PVCs) are correctly configured.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql"
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:5.7
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-persistent-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 1Gi

Apply the StatefulSet:

kubectl apply -f mysql-statefulset.yaml

Best Practices and Considerations for Scaling on AWS EKS

Scaling on AWS Elastic Kubernetes Service (EKS) requires a blend of strategic planning and tactical execution to ensure high availability, cost efficiency, and performance. Here, we'll cover the best practices and considerations essential for optimizing your EKS scaling strategy.

1. Use Cluster Autoscaler

Cluster Autoscaler automatically adjusts the size of your Kubernetes cluster so that all pods have a place to run and resources are optimized.

Installation: Deploy the Cluster Autoscaler in your EKS cluster.
Configuration: Ensure it is properly configured to add or remove nodes based on pod demand.

2. Implement Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of pod replicas based on observed CPU utilization (or other application-provided metrics).

Metrics Server: Deploy a metrics server to collect and provide metrics to the HPA.
Configuration: Define HPA policies in your deployment files.

3. Leverage Managed Node Groups

Managed Node Groups simplify node management, including updates and scaling.

Setup: Use managed node groups for easier node lifecycle management.
Scaling: Set policies to automatically adjust the number of nodes in a node group based on demand.

4. Use Spot Instances

Spot Instances allow you to use spare AWS compute capacity at a reduced cost.

Spot Instance Configuration: Configure your EKS cluster to use Spot Instances for non-critical workloads.
Cost Savings: Monitor and analyze cost savings while ensuring critical workloads are on On-Demand or Reserved Instances.

5. Implement Multi-AZ Deployment

Deploy your EKS clusters across multiple Availability Zones (AZs) for high availability and fault tolerance.

Configuration: Ensure node groups are distributed across multiple AZs.
Resilience: This ensures that the failure of a single AZ does not affect the entire cluster.

6. Monitor and Optimize Resource Requests and Limits

Properly configure resource requests and limits for your pods to ensure optimal utilization and avoid resource contention.

Requests and Limits: Define resource requests and limits for each container in your deployment manifests.
Optimization: Regularly review and adjust based on actual usage.

7. Use Infrastructure as Code (IaC)

Manage your Kubernetes clusters and infrastructure using IaC tools such as Terraform or AWS CloudFormation.

Consistency: Ensure consistent and reproducible infrastructure setup.
Automation: Automate the deployment and scaling of resources.

Considerations for Effective Scaling

1. Performance and Load Testing

Regularly perform scalability testing to understand the limits and performance characteristics of your EKS cluster.

SLIs and SLOs: Define and measure Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to guide scaling decisions.

2. Cost Management

Balance performance and cost by optimizing resource usage and leveraging cost-effective solutions like Spot Instances.

Cost Analysis: Use AWS Cost Explorer and other tools to monitor and analyze costs.

3. Resilience and High Availability

Design your applications and clusters for high availability and resilience to minimize downtime and impact during failures.

Multi-Region Deployment: Consider deploying in multiple regions for disaster recovery.

4. Security Best Practices

Ensure that scaling does not compromise security. Implement Kubernetes security best practices.

IAM Roles: Use AWS IAM roles and policies to control access.
Network Policies: Define and enforce network policies to secure communication between pods.

5. Monitoring and Logging

Implement robust monitoring and logging to gain insights into cluster performance and issues.

Tools: Use tools like Prometheus, Grafana, and AWS CloudWatch for monitoring and logging.

6. Continuous Learning and Adaptation

Continuously review and adapt your scaling strategies based on performance data and changing requirements.

Feedback Loops: Establish feedback loops to incorporate lessons learned and improve scaling practices.

Conclusion

Scaling containerized workloads on AWS EKS based on real-time metrics is a powerful way to ensure that your applications remain responsive and cost-effective. By leveraging tools like HPA, VPA, Cluster Autoscaler, and robust monitoring solutions like Prometheus and Grafana, you can automate scaling effectively. Following the steps outlined in this guide, you can achieve a dynamic, responsive, and efficient Kubernetes environment on AWS.