System & Data Services Downtime
Resolved
Oct 02 at 01:00pm EEST
Dear team,
Now we have an incident and an unplanned downtime due to an AWS node restart, which caused bad Loki resource management and a CDP Data Processor issue.
Incident Details:
Date: 10.02.2025
Status: Resolved
Time: 0:50 AM–3:40 PM GMT+2
Duration: Approximately 170 minutes
Cause: Node went down because of Loki resources overuse and CDP was failing on NPE issue.
Impact on Services:
During the node failures, most of the services restarted. The NPE affected only CDP Data Processor.
SLA Status:
As part of our 99.9% weekly SLA, we allow up to 10 minutes and 5 seconds of downtime per week.
Downtime This Week: >5 hours
SLA Compliance: We exceeded the SLA allowance this week.
We sincerely apologize for the disruption this caused. Thank you for your patience and understanding.
Warm regards,
The Intempt Engineering Team
Affected services