Y2K-type Failure Causes Exchange Server to Stop Processing Email

The New Year came in with a snarl for Exchange Server administrators when Exchange 2016 and 2019 servers stopped processing inbound email because the transport service couldn’t check messages for malware. Microsoft fixed the problem by releasing a script to clean up the files used by the malware engine and then downloading and applying the latest malware signature file. Administrators must apply the fix to every Exchange 2016 and Exchange 2019 server in an organization. Make sure that you use an account with permissions to update Exchange when you run the script.

Reminiscent of the Y2K problem, the root cause of the issue is a date check problem. Exchange Server downloads daily updates to ensure that the malware scanning engine can detect recent malware. After downloading, Exchange performs a version check using the date. After the clock clicked over to 2022, the date check failed and logged event 5300.

Log Name: Application 
Source: FIPFS 
Logged: 1/1/2022 1:03:42 AM 
Event ID: 5300 
Level: Error 
Computer: server1.contoso.com
Description: The FIP-FS "Microsoft" Scan Engine failed to load. PID: 23092, Error Code: 0x80004005. Error Description: Can't convert "2201010001" to long.

From the description, it looks like the failure occurred when a routine attempted to convert the 2201010001 (January 1, 2022, 00:01am?) to a long value (as this Reddit thread points out, the value is too high to store in a long value).

Microsoft emphasizes that this is not a security problem or a flaw with either malware scanning or the malware engine. However, as it stops mail flow, it’s a serious operational issue.

It seems like the fix works and Exchange servers begin clearing inbound mail queues after its applications. Some people report that they have had to restart servers, but this shouldn’t be necessary. The problem doesn’t appear to affect earlier versions of Exchange Server.

If you run a hybrid deployment and route messages through your on-premises infrastructure, the problem affected inbound email to Exchange Online mailboxes.

Testing Anyone?

Now that the situation has calmed, the inevitable question is why Microsoft allowed such a problem to happen. Even viewed in the most benign light, this is an inexplicable catastrophic testing failure. It might be further evidence of the increasing lack of attention Microsoft pays to the on-premises version of Exchange in its desire to move seats to Exchange Online. Although some on-premises deployments do themselves no favors through lack of maintenance, especially in the timely application of security and other updates, this problem had nothing to do with on-premises administration: it is entirely due to Microsoft incompetence.

No disruption appears to have happened to Exchange Online. This could be because the code bases used by the on-premises and cloud versions separated some years ago; it could also be due to better testing and maintenance in the code. Or it’s because the on-call Microsoft engineers noticed the problem immediately after the New Year and fixed the issue without anyone noticing.

I still think many on-premises deployments would be better off in the cloud, but only because of the better feature set available there. Microsoft errors in on-premises software shouldn’t be a motivation for customers to move their mail traffic to Exchange Online.

Comments

Bob Robinson 13 Jan 2022 Reply

We did not suffer a mail flow stoppage on our 2016 servers due to this issue, but only because we use Symantec Mail Security AND select the option to disable malware scanning when installing Exchange. We did, however, receive a slew of SCOM alerts caused by the 5300 and other associated events. The fix was applied, but it did require a restart of the servers to get the errors to stop. We are in a Hybrid environment, and use on-premises Exchange to handle outbound mail flow, and to relay messages from the multitude of on-premise systems. It is disappointing that Microsoft has decided to punish organizations that chose to continue to operate Exchange in a solely on-premises environment by reserving a number of very useful features for the cloud implementation. Even Hybrid deployments could benefit from things like the ability to see transport rule hits easily.

Y2K-type Failure Causes Exchange Server to Stop Processing Email

Testing Anyone?

About the Author

Tony Redmond

About the Author

Comments

Leave a Reply Cancel reply

Subscribe for Practical 365 updates

Y2K-type Failure Causes Exchange Server to Stop Processing Email

Testing Anyone?

About the Author

Tony Redmond

About the Author

Comments

Leave a Reply Cancel reply

Latest Articles

The Great “Connected Experiences” Panic of 2024

Practical Copilot: Figuring Out What People Do with Microsoft 365 Copilot

Practical Sentinel: Operationalizing Health Monitoring