Microsoft has published a Preliminary Post Incident Report on last week’s events which broke Outlook on Windows for millions of users, making emails impossible to view or create.
It was on the evening of May 11th, UK time, and in the middle of the working day in North America (18:24 UTC, 19:24 BST, 11:24 Pacific), that many Office 365 customers using Outlook on Windows observed a frustrating problem: blank emails (or maybe showing just one line), even though a few lines of the email could be previewed in the list of messages. Attempts to create or reply to an email allowed a line of text to be typed, but when the user pressed enter, the text disappeared.
Admins steeped in the strange ways of Office and Windows tried the usual range of fixes, rebooting PCs, reinstalling Office, poking through the event viewer, digging deep into Outlook options to disable hardware graphics acceleration, but generally without success. The only things that typically worked was rolling back the last Office update, or running Outlook in safe mode. It was a huge waste of time.
“Nice job, MS!! I just spent 3 hours on a machine trying to figure this out!” said one admin. “Thanks I had nothing else to do today… roll back all my clients’ outlook.. and who do i bill for my time?” said another.
Many questioned how the company could roll out an update to the world (users across the globe were impacted) which had such an obvious and catastrophic impact. “Do you not run even basic tests before releasing these shoddy updates?” said a user.
The only mitigation is that the fix came relatively quickly and was rolled out as if by magic. “Ok so you fixed it… I am not sure how… when I have disabled both windows update and office update?” said a perplexed user.
Microsoft recorded the problem as starting at 18:24 UTC on May 11th and fixed by 2:00 UTC (03:00 BST, 19:00 Pacific) on May 12th.
The technical fault was assigned the number EX255650 and categorised as an issue with Exchange Online – odd, since it also impacted Outlook when used with Exchange on-premises, and Microsoft’s practice of putting the issue details into the Office 365 admin center annoyed those who could not access it.
What was the problem and how was it fixed?
Microsoft has revealed some details. Initially the company said it was “a recent change to systems that facilitate text display management for content within the Outlook client.”
The recently released post incident report states: “While mitigating another problem in Microsoft Word, a configuration rollout was deployed for specific 16.0.13929.xxxxx build versions. However, this had an unexpected side-effect on the Outlook client and Web Layout View used by Microsoft Word.”
Word’s Web Layout view is little used, and users are only likely to encounter it if they open an HTML document in Word, or foolishly attempt to author web content there and save as HTML.
Why did this impact Outlook?
The reasons go back to the days of Outlook 2003 and earlier, where HTML email content was rendered by embedded Internet Explorer.
In Outlook 2007, embedded IE was replaced by embedded Word, using Word’s HTML support, upsetting authors of marketing emails who had to tailor their content to suit those limited capabilities, but making Outlook significantly more secure.
The same is true for authoring emails: Word is wheeled in, though with some strange Outlook-specific behaviour. For those who wonder why Outlook makes it so hard to format emails sensibly when doing things like quoting parts of an email received in the reply and typing in-line comments, or putting the background of the incoming email behind the reply: it is the interaction with Word that is to blame.
This interaction turned from annoyance to disaster in the faulty configuration that gave rise to EX255650.
How the heck was that done?
Microsoft’s impact report is far from comprehensive, but there are clues about what happened. A configuration, in this context, is something “which Microsoft 365 apps use to determine which features, components, or specific code paths are enabled or disabled. We can target these rollouts to specific builds, apps, platforms, or audiences. Configurations can be synchronized up to every four hours by Microsoft 365 apps,” the company said.
That four-hourly call home presumably explains why Microsoft was able both to introduce and to fix the issue with so little sign, from a user perspective, that anything was changing.
Windows just did some background stuff and bam, Outlook breaks; then bam, Outlook is fixed. Microsoft has not revealed the exact detail of what setting was responsible, nor has it yet said what was the problem in Word that it fixed.
How was it not noticed during testing?
Microsoft said that it is “performing an extensive review of our coding to understand how the issue occurred and why it was missed during the testing and early deployment cycle,” and there may be a further update to follow on this matter, when the preliminary report is replaced with a final one.
We can speculate though that some small corner of Microsoft did not take account of the key role Word Web Layout plays in Outlook.
In the report, the company noted that for Word, “the default view is not Web View, so only a small percentage of users use it. Other views, such as Page View and Reading View, were not impacted.”
In Outlook though, “web Layout is the default and only view available.”
That said, even breaking Web Layout in Word is something that should have been spotted – though combining rapid response to security problems with thorough testing of updates for unexpected side-effects is a challenge.
Some of the less appealing aspects of Microsoft’s platform were exposed by this bug. One is that Outlook, despite its high value in integrating email, calendar, contacts and tasks, remains full of legacy code that can cause problems.
Another is that today’s Windows and Office 365 users are one bad update away from hours of lost productivity.
One thing to note though: users who followed a full cloud model, using the web browser rather than desktop Outlook, were not impacted at all. ®