Date: September 9, 2025
Incident Reference: CEPH-2025-009
Status: RESOLVED
On September 9, 2025, our storage cluster experienced a service disruption while implementing a planned infrastructure upgrade to enable multi-datacenter operations. During the activation of "stretch mode" (a feature that allows data to be synchronized across multiple datacenters for improved disaster recovery), the cluster's internal authentication system encountered an unexpected failure.
Duration: 12:30 → 15:00 (2hr30m)
Affected Services: Data Processing
Data Safety: All customer data remained secure and intact throughout the incident. No data was lost or corrupted.
During the incident period, customers may have experienced:
The issue occurred when our storage system automatically reconfigured all data pools during the multi-datacenter setup process. This reconfiguration inadvertently affected critical system components responsible for user authentication, preventing normal access to the cluster even though the underlying data remained safe and accessible.
Our engineering team worked to restore service by upgrading the storage cluster software to a newer version that provided additional recovery options not available in the previous version. This upgrade allowed us to safely reverse the multi-datacenter configuration and restore normal cluster operations.
We sincerely apologize for any inconvenience this incident may have caused. Data security and service reliability are our highest priorities. We are committed to learning from this experience and implementing the necessary improvements to prevent similar incidents in the future.
If you have any questions or concerns about this incident, please contact our support team.