Intermittent episodes of Keyword/Topic extraction failure in the new and legacy platforms
Incident Report for Voicebase
Our logs show several brief periods this afternoon where our keyword/topic extraction service experienced delays, and even some failures.

The root cause was an intermittent outage in Amazon STS, and a lack of resiliency in our system to this condition. Most affected jobs experienced only delayed processing, but some jobs were hit by a series of consecutive failures, resulting in a failed job.

The Amazon STS system is healthy again as of now, but we are working on a patch to improve resiliency.
Posted about 1 year ago. Aug 02, 2017 - 18:28 PDT