I worked on a support case recently that began with reports of an unusually high volume of outbound email from one email address. We soon received further reports of external systems receiving a high volume of emails from that email address, in some cases leading to performance and stability issues.

From the information provided to us in the reports, and subsequent conversations with the third parties that had so far contacted us, we found what appeared to be an infinite loop occurring for a specific distribution list.

This distribution list was not a typical one for the organization, as it contained mostly external contacts rather than internal recipients. These external contacts were the ones receiving thousands of email messages as the infinite loop was occurring.

Performing a message tracking log search for that distribution list showed some alarming results. Across multiple Hub Transport servers in the site the total message traffic was very high.

In fact several days after the incident I used Log Parser to generate this graph demonstrating just how much additional message traffic was generated by the loop condition. Normally the site would process 200-300,000 messages. On this particular day just over 1,000,000 messages were processed.

The Case of the 739,254 NDR Message Loop

With the help of one of the third parties who was, unfortunately, also receiving a deluge of email because of this infinite loop, we located the original message that triggered it.

The original email message was sent to the distribution list by an external sender. But it was not that original message that was looping continuously, rather it was a series of non-delivery reports (NDRs) for a few members of the distribution list that apparently are no longer valid.

The NDRs were being sent to the distribution list’s email address, which distributed them to all members of the group (including the non-existent ones), triggering a new set of NDRs back to the distribution list, and so on.

Looking at the original message as it entered the organization and was sent out to all of the distribution list members I noticed this information in the message tracking log data.

For the purposes of this article I’ve cut out the log details that aren’t relevant, and changed the addresses. So the characteristics of the message are:

  1. Email sent by external sender “sender@externalcompany.com”
  2. Email sent to distribution list “distributionlist@internal.com”
  3. A member of the distribution list is external contact “paul@practical365.com”

So, first the message is received on the Transport server, and everything looks as you’d expect it to look.

EventId        : RECEIVE
ServerHostname : HTSERVER1
Sender         : sender@externalcompany.com
ReturnPath     : sender@externalcompany.com
Recipients     : {distributionlist@internal.com}

Next, the distribution list is expanded and the specific contact “paul@practical365.com” is visible as the recipient of the message.

EventId        : EXPAND
ServerHostname : HTSERVER1
Sender         : sender@externalcompany.com
ReturnPath     : sender@externalcompany.com
Recipients     : {paul@practical365.com}

Next, the ReturnPath is changed from the original sender to the address of the distribution list. This is where the problem becomes apparent.

EventId        : TRANSFER
ServerHostname : HTSERVER1
Sender         : sender@externalcompany.com
ReturnPath     : distributionlist@internal.com
Recipients     : {paul@practical365.com}

The message exits the organization to be delivered to the external contact “paul@practical365.com”. The sender is still the original sender, but the return path has been modified.

EventId        : SEND
ServerHostname : ETSERVER1
Sender         : sender@externalcompany.com
ReturnPath     : distributionlist@internal.com
Recipients     : {paul@practical365.com}

To a normal, valid recipient they will see the sender is “sender@externalcompany.com”, and if they reply to the message it will go to “sender@externalcompany.com”.

But if the recipient is not valid, the receiving server will generate an NDR, and that NDR will be addressed to the ReturnPath, not the sender.

If the original sender had been internal to the organization, the ReturnPath would not have been modified. So why the different behaviour when the sender is external?

The reason is the Sender Policy Framework (SPF). The Exchange server are simply complying with RFC 4408 for SPF, which states:

“Mailing lists must be aware of how they re-inject mail that is sent to the list. Mailing lists MUST comply with the requirements in [RFC2821], Section 3.10, and [RFC1123], Section 5.3.6 , that say that the reverse-path MUST be changed to be the mailbox of a person or other entity who administers the list. Whereas the reasons for changing the reverse-path are many and long-standing, SPF adds enforcement to this requirement.”

RFC 1123 explains the reasoning a little better:

“The return address in the envelope is changed so that all error messages generated by the final deliveries will be returned to a list administrator, not to the message originator, who generally has no control over the contents of the list and will typically find error messages annoying.”

The general idea is that a distribution list administrator (whether that is a human or automated agent as is the case in a lot of email marketing platforms) receives the NDRs and removes them from the distribution list, rather than the original sender (who is unlikely to be an administrator) receiving them and being unable to take any action on them.

Prior to SPF your email server could get away with not following that requirement in the RFC. However SPF enforces the requirement, therefore Exchange behaves in a compliant manner so that the mail will be delivered successfully.

So, is it possible that others are exposed to this looping condition? Yes, if they have distribution lists that have the following characteristics.

  1. The group allows unauthenticated senders (which is the default for groups created pre-Exchange 2007)
  2. The sender of the original email is external to the organization (for internal senders the ReturnPath of the message is not modified)
  3. The group contains one or more external contacts that are not valid address and result in an NDR
  4. The group is configured to send delivery reports to the message originator (which is the default setting)

The delivery report configuration can be set in the properties of the distribution group.

The Case of the 739,254 NDR Message Loop

If the group is configured to send no delivery reports, the ReturnPath is not modified by Exchange and a loop would not occur. Suppressing delivery reports in this way only impacts NDRs, but not other auto-replies such as “out of office”.

Similarly, if the group is configured to send delivery reports to the group manager, the ReturnPath is set to the group manager rather than the group’s email address so the loop also would not occur.

So, the lesson learned here is to review your distribution groups to see whether they meet characteristics 1, 2, and 4 above (3 is harder to test on its own) and either adjust your group’s authentication requirement (to lock them down from external senders), or adjust the delivery report configuration to suppress or redirect the NDRs to a person who can take action on them.

About the Author

Paul Cunningham

Paul is a former Microsoft MVP for Office Apps and Services. He works as a consultant, writer, and trainer specializing in Office 365 and Exchange Server. Paul no longer writes for Practical365.com.

Comments

  1. ara

    Hi Paul could you please let me know how you check above receive,expand,transfer,send using powershell ?
    thanks in advance

  2. Aaron Mason

    This reminds me of a situation we had with our Service Desk software. We resolved a call for someone who was on leave, but they’d set a mailbox rule instead of the auto-responder so every time our software sent them an email informing them that their call was resolved, they’d send an email back saying they were unavailable, triggering an email acknowledging the response they sent, which would in turn trigger the mailbox rule… et cetera, et cetera. We had to log onto the service desk mailbox to delete the email before the service desk software could pick it up and start the process over again.

  3. fadi chebli

    hi all
    same case happened on exchange 2013

    didnt find the options for the ndr on the group , is it available in exchange 2013 ?

    1. simm

      Did you found solution for Exchange 2013?

  4. Minty

    Just had this happen to us last week and our parent company is freaking out, wanting to know precisely the external recipients involved. We met a few of the conditions for this to happen, including DLs with external contacts which also happen to be DLs. The external recipient DLs contained further DLs. Also, sender authentication was not enabled. Message Tracking Logs give me the host name for server generating the NDR, and this gives us some idea of the external companies involved, but am I right or wrong to assume there is no way to enumerate individual recipients when so many downstream external Distribution Lists are involved. My company wants to get an expert in from Microsoft.

    1. Avatar photo
  5. Gopinath T

    Paul

    Scenario is as below

    – Automated job sends a mail to a DL, all internal recipients, and one recipients mailbox is full.
    – The DL was set to send delivery reports to group owner, and the NDR was sent to one of the group managers.
    – The group manager was confused as the NDR said “person X mailbox is full” and we clarified it why.

    The DL doesnt have any external recipients bu the automated mail comes from an address that is non-existent, will changing the setting to send delivery reports to message orginator create NDR between the non-existent sender and the recipient whose mailbox is full.
    We can test this, just wanted to hear it from the master 🙂

    Thanks
    Gopinath T

  6. Tim Nielsen

    Paul, the issue I am experiencing is very similar to the one you described but there are some differences. In my case, I am currently in a transition/co-existence period from 2003 to 2010 (I purchased and used your Exchange Server 2003 to 2010 guide by the way and it was a fantastic resource, big thank you!).

    An external contact sent a message to a distribution group, which has a large number of external recipients. And this resulted in the out-of-office messages from those external recipients to be sent back to the distribution list which in turn distributed it to everyone else. I am seeing the same behavior in the message tracking as you described above in that when the message is “TRANSFER” the return path is changed from the external sender to the address of the distribution group and causing this behavior.

    You mention above the following which is where my confusion starts.

    “If the group is configured to send no delivery reports, the ReturnPath is not modified by Exchange and a loop would not occur. Suppressing delivery reports in this way only impacts NDRs, but not other auto-replies such as “out of office”.”

    My understanding from the above is that clicking the button for “Do not send delivery reports” will not prevent Out-of-Office replies but in my case that is the only thing I am seeing. Am I misunderstanding this or do you think that my situation is unique to the one you described?

    And lastly, do you know why this behavior would start now? I am only about 1 week into my co-existence period between 2003 and 2010. My guess is that the RFC you mentioned above was put into effect with the installation of Exchange 2010 but I cannot find anything specific to confirm that assumption.

  7. Cindy Filler

    I experienced this same thing this week and I’m still trying to resolve/prevent this from happening again. In my case An internal user sent the original message. One of the outside people on the distribution list did a reply all and that started a NDR going out to everyone every minute. Does this sound like the same issue as above?

  8. Daniel

    I had a very similar problem lately with an NDR-NDR loop for a user who had left the organisation. Strangely, I think it had something to do with an out of office reply as well.

    There were so many emails flowing out because of it that I couldn’t even view the last two hours’ worth in Messagelabs Track & Trace – there were too many email to be displayed. Several hundred were being sent per minute.

    1. Avatar photo

      Messagelabs was involved in this case as well, they generated the highest number of NDRs while the loop was occurring. Not because they were doing anything wrong, presumably just because they have so much more server power than any of the other systems that were also adding NDRs into the mix.

Leave a Reply