Back to overview

Production AWS node restart

Apr 14 at 02:20pm EEST
Affected services
Experience Services
Messaging Services
Data Services
Audience Services
Analytics Services
Console (app.intempt.com)

Resolved
Apr 14 at 02:20pm EEST

On April 14, 2025, from 1:20 PM to 2:15 PM (UTC+3), the production environment was unavailable for approximately one hour due to an unplanned restart of one of the AWS nodes.
Immediately after discovering the problem, the on-call engineer promptly contacted the solution architect and informed him of the situation. At that time, there was no way to speed up the recovery, since the node raising process was entirely on the AWS side. It was decided to monitor the process and wait for the infrastructure to be restored. A ticket was also created to analyze the reasons for the restart and prevent similar cases in the future.
Actions taken:
The incident was recorded and documented;
Contact with the architect was established immediately after the incident;
A ticket was created regarding the node failure;
The infrastructure recovery status was monitored until complete stabilization.