Dear team,
Now we have an incident and an unplanned downtime due to an AWS node restart, which caused bad Loki resource management and a CDP Data Processor issue.
Incident Details:
Date: 10.02.2025
Status: Resolved
Time: 0:50 AM–3:40 PM GMT+2
Duration: Approximately 170 minutes
Cause: Node went down because of Loki resources overuse and CDP was failing on NPE issue.
Impact on Services:
During the node failures, most of the services restarted. The NPE affected only CDP Data Processor.
...