COLLECTIONS:

  • Agents:

    Selection based on suitability and configuration.

  • Daemon Set Pod:

    Design, integration, and deployment.

  • Logs:

    Source log configuration changes, enrichment, and filtering.

  • Shipping:

    Agent-target integration and pipeline benchmark testing.

MANAGED AWS ES VS ELK:

  • AWS ES Service vs. Self-Managed:

    Choose one over the other based on – team ELK familiarity, size of the cluster, visibility and extendibility, uptime SLA requirements, and cost.

STORAGE/RETRIEVAL:

  • Bucket organization:

    Separation and storage of logs by customer VPC.

  • Flexible retrieval:

    Functions to retrieve logs by adjustable intervals for each customer.

  • ES snapshots:

    Keep recent month data in ES, store remaining as snapshots in S3.

  • Archival:

    Store recent year data in S3 and remaining in Glacier.

SIZING/COST:

  • Cluster size:

    Configure ES and Kubernetes cluster size based on the data volume estimates.

  • Storage volume:

    Determine storage requirements for S3 and Glacier based on the data volume. How much is configured as in-frequent vs. frequent access in S3?.

  • Costs:

    Estimate TCO.

INTEGRATIONS:

  • Services integrations:

    CloudTrail and CloudWatch, Slack, Jira, and other notification channels with pager duty.

  • Solution integrations:

    Agents to CloudWatch, CloudWatch, and Solution, Solution with S3 and Glacier, Solution, and PagerDuty.

MONITORING/ALERTS:

  • CloudWatch vs. ELK:

    Determine which alerts will be raised from CloudWatch vs. ELK.

  • Custom Alerts:

    Alert customizations required for slack messages and JIRA tickets.

  • Monitor Solution and Services:

    Agents and solutions should be monitored for 24/7 uptime with Prometheus.

DIGITAL OPERATIONS:

  • Incident management:

    We act as a central system for all notification channels integrations.

  • Extensibility:

    Send alerts from CloudWatch and ELK to SNS, so that other systems and tools can subscribe to them in addition to pager duty.

SECURITY:

  • Agents:

    Setup the Daemon Set Pod security policy.

  • Solution:

    Security implementation for the customer VPC to solution VPC and ELK stack, S3, and PagerDuty.

  • Logs:

    Custom tamper proof implementation with sign and sequence approach.

DEPLOYMENT:

  • Agents:

    Integrate Daemon Set Pod Kubernetes deployment with Kubernetes cluster running in customer VPC.

  • Solution:

    ELK stack on Kubernetes using Helm charts or Rancher, S3 buckets setup, Configure archival from S3 to Glacier, and PagerDuty services setup.

TESTING AND STABILIZATION:

  • Create a test environment in the Client network.
  • Continuous deployment and testing in the target environment is critical.