Back to overview

Production AWS node restart

Apr 17 at 09:24am EEST
Affected services
Experience Services
Messaging Services
Data Services
Audience Services
Analytics Services
Console (app.intempt.com)

Resolved
Apr 17 at 09:24am EEST

On April 17, 2025, from 9:05 AM to 9:25 AM (UTC+3), the production environment was unavailable for approximately one hour due to an unplanned restart of one of the AWS nodes.
Immediately after discovering the problem, the on-call engineer promptly contacted the solution architect and informed him of the situation. At that time, there was no way to speed up the recovery, since the node raising process was entirely on the AWS side. It was decided to monitor the process and wait for the infrastructure to be restored. A ticket was also created to analyze the reasons for the restart and prevent similar cases in the future.
Actions taken:
The incident was recorded and documented;
Contact with the architect was established immediately after the incident;
The infrastructure recovery status was monitored until complete stabilization.