SharePoint’s Sasquatch Memory Leak

Background

I have been working with SharePoint for a few years now and have run into many nasty high memory or Out of Memory (OOM) issues. To date many of SharePoint’s memory problems have been discussed as problems with developers not using the Dispose pattern properly when using the SharePoint OM. And while I have found this to be the case I always had a feeling that there was something more, another larger leak which could not be explained simply by not calling Dispose. Sure not calling dispose will keep the SPRequest COM object (which is fairly large) around longer than necessary however ref-counting will clean these up eventually but even with using the correct coding patterns and practices the w3wp processes still seem larger than necessary; even on my development machine when I run stress against an out of the box installation. So there must be something else going on here; and so started my search for Big Foot.

Big Foot Makes an Appearance

Recently I was contacted by a friend who works on the IIS team to help troubleshoot a memory issue with SharePoint. This is a pretty common request but what really piqued my interest was when he told me he had a 9 GB memory dump. For those not familiar with memory dumps the size represents the size of the memory being used by the process so for this w3wp process it was using 9 GB of memory and to make matters worse the process had only been up about 56 min. This seemed like the best opportunity to find a nasty and elusive memory leak!

Cracking open the memory dump I found something very interesting, the process had upward of 24,600+ ASP.global_asax objects. So why is this important? The global_asax represents the file based HttpApplication which is used during request processing. You can think of these guys as the tour guide to the user’s request which guides the request through ASP.net’s request processing pipeline. The HttpApplicationFactory keeps a free list of HttpApplication instances and as new requests reach the system one is pulledf from the free list and assigned to the request to guide the request through the pipeline or in the case when none are available will instantiate a new HttpApplication. Once complete the HttpApplication is either returned to the free list or is discarded depending on the number of HttpApplication(s) already in the free list. In addition the HttpApplicationFactory will also attempt to trim unused or unneeded HttpApplications from the free list every 30 seconds. For each pass which a free HttpApplication exists within the free list a single HttpApplication will be disposed.

So now that we understand HttpApplication lifetime why is it that this process memory dump  have so many? Looking at the HttpApplicationFactory I found that only about 82 HttpApplications were in the free list and combined with the number which were processing requests at the time of the memory dump we can only account for less than 90. This means the other 24,000+ were either being leaked or have yet to be collected by the GC. Since I did not believe that the lack of GC to be the problem I started to look for what was keeping these objects rooted such that the GC could not collect them. I soon came across this object:

0:000> !do 00000002435a8290 

Name: System.UnhandledExceptionEventHandler 

MethodTable: 0000064278467860 

EEClass: 0000064278005670 

Size: 64(0x40) bytes 

GC Generation: 2 

(C:\WINNT\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll) 

Fields: 

              MT            Field           Offset                 Type VT             Attr            Value Name 

0000064278435ed8  40000ff        8        System.Object  0 instance 00000002435a8290 _target 

0000064278434990  4000100       10 ...ection.MethodBase  0 instance 0000000000000000 _methodBase 

0000064278438ff0  4000101       18        System.IntPtr  1 instance 6882685329544 _methodPtr 

0000064278438ff0  4000102       20        System.IntPtr  1 instance 6882550903584 _methodPtrAux 

0000064278435ed8  400010c       28        System.Object  0 instance 0000000193373b88 _invocationList 

0000064278438ff0  400010d       30        System.IntPtr  1 instance 24686 _invocationCount

0:000> !do 00000002435a8290
Name: System.UnhandledExceptionEventHandler
MethodTable: 0000064278467860
EEClass: 0000064278005670
Size: 64(0×40) bytes
GC Generation: 2
(C:\WINNT\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
              MT            Field           Offset                 Type VT             Attr            Value Name
0000064278435ed8  40000ff        8        System.Object  0 instance 00000002435a8290 _target
0000064278434990  4000100       10 …ection.MethodBase  0 instance 0000000000000000 _methodBase
0000064278438ff0  4000101       18        System.IntPtr  1 instance 6882685329544 _methodPtr
0000064278438ff0  4000102       20        System.IntPtr  1 instance 6882550903584 _methodPtrAux
0000064278435ed8  400010c       28        System.Object  0 instance 0000000193373b88 _invocationList
0000064278438ff0  400010d       30        System.IntPtr  1 instance 24686 _invocationCount

This object is the event handler for an AppDomain’s UnhandledExceptionEventHandler. Note the _invocationCount member, its value is 24,686. What this means is that over 24,000 objects have registered for notification when the AppDomain for which this object exists exits abnormally. Digging further I found the SPHttpApplication as the type which the _target was referencing. This object derives from HttpApplication and is itself derived from ASP.Net’s generated global_asax class. In SPHttpApplication’s Init() method it hooks the AppDomain’s UnhandledExceptionEventHandler. While the callback for this event logs what may be important information to the ULS log when an unexpected process exit it does not need to do this 24,000+ times.

As most readers will already know static objects live for the life of the AppDomain so an AppDomain’s lifetime is equal to that of static objects within the .Net Framework. As such when an object registers for event notification from an object which is either static (or in this case the AppDomain) the registering object roots itself for the lifetime of the AppDomain. The GC cannot collect this object until either the AppDomain restarts or the object unregisters itself from the event handler. Unfortunately SPHttpApplication does not unregister itself from the AppDomain’s UnhandledExceptionEventHandler handler so each instance will live as long as the AppDomain. So even after the HttpApplicationFactory releases the SPHttpApplication the object will continue to remain in memory until the AppDomain restarts.

So we have a leak, however it gets worse..the ASP.Net runtime creates a new HttpContext for each request which is processed. Its this object which holds references to the HttpRequest, HttpResponse, and most importantly the handler which will service the request. For the cases where this handler is a page the entire control hierarchy, member variables, etc is referenced. Since the HttpApplication indirectly has a reference to the HttpContext the amount of memory being referenced can be huge. Take a look at these objects which I found in the memory dump, note that for each HttpApplication which is created an HttpContext, HttpResponse, etc are too created.

Number of instances

Total Size

Type

35,449

1,227,120

System.String[][]

35,449

1,417,960

System.Web.Security.CookielessHelperClass

35,449

2,268,736

System.Web.HttpResponseStream

35,449

5,104,656

System.Web.HttpWriter

35,449

11,343,680

System.Web.HttpResponse

35,449

11,910,864

System.Web.HttpRequest

35,449

11,910,864

System.Web.HttpContext

So what’s the bottom line? – For each HttpApplication which is leaked a potential huge number of child references are too rooted and therefore leaked. While the table above would seem to indicate this is a small leak the numbers above do not reflect the size of all child objects which have been rooted.

So What is The Fix?

As of today we have just begun the process of having this issue addressed by the SharePoint 2007 servicing team so we do not have a fix at the time this blog was published. Once the fix is available I strongly advise everyone to install it and if implemented remove the workaround I provide below.

Workaround Please

For those that do not want to wait for a fix I have a workaround however it will require a bit of coding on your part but its really not that difficult. What we want to do here is ensure we only let the SPHttpApplication register only once for the AppDomain’s UnhandledExceptionEventHandler. Since the SPHttpApplication does this from within its Init() virtual method what we can do is ensure this gets called only once so as to only root one HttpApplication instance.

  1. Within VS2008 create a new class library and add a reference to the Microsoft.SharePoint library. Ensure you also strong name the assembly because we want to place this assembly in the GAC.
  2. Add a new class and paste the code below into this file.
    using System;
    using Microsoft.SharePoint.ApplicationRuntime;
    
    namespace ToddBlog.Sample
    {
        public class CustomHttpApplication : SPHttpApplication
        {
            private static bool s_initialized = false;
    
            public CustomHttpApplication()
            {
            }
    
            public override void Init()
            {
                if (!s_initialized)
                {
                    lock (typeof(CustomHttpApplication))
                    {
                        if (!s_initialized)
                        {
                            base.Init();
                            s_initialized = true;
                        }
                    }
                }
            }
        }
    }
  3. Compile the project and deploy the resulting assembly into the GAC on every SharePoint WFE server within your farm.
  4. For each web application edit the global.asax file and change the assembly reference to point to your custom assembly created in step #1 and change the Inherits attribute to reference the class you created in Step #2.
<%@ Assembly Name="[name of your assembly"]%>
<% Application Language="C#" Inherits="ToddBlog.Sample.CustomHttpApplication" %>

 

 

 

 

Results

After having the customer install the fix I have described here the process which would normally grow to 10 GB an hour and recycle never went above 2.5 GB! And even after taking all servers out of the load balancer and running the entire farm off one WFE the memory never went above 3.3 GB!!

Who’s Impacted

This issue is applicable to MOSS 2007 and WSS 3.0. SharePoint 2010 is not effected by this issue. The impact to your application depends on many factors such as concurrency, traffic and the amount of uptime. For most decent size installations however this could greatly reduce the amount of memory needed by the IIS Application pool process.

Feedback

At the time of this post I have only worked with one customer for which this fix was huge. I would love to hear from others which implement this fix to hear how much impact it had in other environments, so please leave a comment with your results.

Post to Twitter Post to Facebook Post to LinkedIn Post to Delicious Post to Digg

10 thoughts on “SharePoint’s Sasquatch Memory Leak

  1. Jeremy Thake

    Great detailed post!

    So just to clarify this, how do we konw when we need to use this work around? Only when we reference SPHttpApplication object in our managed code? or is this just something that occurs using the SharePoint 2007 object model e.g. SPContext etc.?
    Clearly the SPDisposeCheck tool does not pick these things up either from looking at the defined rules.

    Reply
  2. Rob Toro

    Todd, great post and technical deep dive into the root cause.

    I have a question though. This seems to be only a big issue on x64 deployments since on a x86 install the w3wp process would be recycled once it hits the 2GB ceiling. Is my line of thinking correct here, or am I missing something obvious?

    Reply
  3. Søren Nielsen

    Excellent article!
    Nice find, We’ll try to implement it on some of our sites that are in trouble today.

    It will take some days to get a decent baseline and then implement the fix. Can’t wait to see the effect.

    :-)

    Reply
  4. Todd Carter

    Tristan, Jeremy, Jomit — Every SharePoint environment is impacted by this bug. What you consider normally function is impacted here but to what degree depends on how much traffic you are running through you servers. Obviously single server dev environments where one developer is making requests is not going to see much of an impact but when you move to sites where traffic and concurrency are fairly common you are going to see a higher rate of memory use.

    Reply
  5. Martin

    MS Sharepoint is indeed impacted by this issue. I’ve been testing it for a while on a fresh installation and w3wp.exe continues to leak memory.
    In my case it takes up 16 gb of RAM in about an hour.
    Funny thing is. No one is connecting to this sharepoint since it’s not in production!
    Another great deomonstration of how MS continues to do things worse release after release.
    I’m glad I don’t use MS products outside of my main job :-)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>