• Agents:

    Selection based on suitability and configuration.

  • Daemon Set Pod:

    Design, integration, and deployment.

  • Logs:

    Source log configuration changes, enrichment, and filtering.

  • Shipping:

    Agent-target integration and pipeline benchmark testing.


  • AWS ES Service vs. Self-Managed:

    Choose one over the other based on – team ELK familiarity, size of the cluster, visibility and extendibility, uptime SLA requirements, and cost.


  • Bucket organization:

    Separation and storage of logs by customer VPC.

  • Flexible retrieval:

    Functions to retrieve logs by adjustable intervals for each customer.

  • ES snapshots:

    Keep recent month data in ES, store remaining as snapshots in S3.

  • Archival:

    Store recent year data in S3 and remaining in Glacier.


  • Cluster size:

    Configure ES and Kubernetes cluster size based on the data volume estimates.

  • Storage volume:

    Determine storage requirements for S3 and Glacier based on the data volume. How much is configured as in-frequent vs. frequent access in S3?.

  • Costs:

    Estimate TCO.


  • Services integrations:

    CloudTrail and CloudWatch, Slack, Jira, and other notification channels with pager duty.

  • Solution integrations:

    Agents to CloudWatch, CloudWatch, and Solution, Solution with S3 and Glacier, Solution, and PagerDuty.


  • CloudWatch vs. ELK:

    Determine which alerts will be raised from CloudWatch vs. ELK.

  • Custom Alerts:

    Alert customizations required for slack messages and JIRA tickets.

  • Monitor Solution and Services:

    Agents and solutions should be monitored for 24/7 uptime with Prometheus.


  • Incident management:

    We act as a central system for all notification channels integrations.

  • Extensibility:

    Send alerts from CloudWatch and ELK to SNS, so that other systems and tools can subscribe to them in addition to pager duty.


  • Agents:

    Setup the Daemon Set Pod security policy.

  • Solution:

    Security implementation for the customer VPC to solution VPC and ELK stack, S3, and PagerDuty.

  • Logs:

    Custom tamper proof implementation with sign and sequence approach.


  • Agents:

    Integrate Daemon Set Pod Kubernetes deployment with Kubernetes cluster running in customer VPC.

  • Solution:

    ELK stack on Kubernetes using Helm charts or Rancher, S3 buckets setup, Configure archival from S3 to Glacier, and PagerDuty services setup.


  • Create a test environment in the Client network.
  • Continuous deployment and testing in the target environment is critical.