The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Avoiding contention for the heap

I was wondering if anyone could recommend any books, articles, or web pages that address the topic of heap management in the context of a multithreaded application.

Suppose a server application runs a few worker threads and this pool of threads manages a large number of concurrent client connections.  Each client might periodically send requests that need to be buffered as they come, which suggests dynamic memory allocation of some kind.

On Windows, the default process heap is guarded with a critical section, leading me to believe that any such application that made frequent heap calls could effectively bottleneck on the heap. 

I have some ideas on how to mitigate this, but I'm really more interested in reading about existing approaches first.
Meganonymous Rex
Friday, June 15, 2007
 
 
Only heap allocation/deallocation need guarding and they don't usually take long and generally don't hang.

A common problem on windows is not getting back freed memory in a timely manner so a lot of high performance multithreaded apps just allocate a block at startup and do their own management.
Martin Send private email
Friday, June 15, 2007
 
 
It's a bigger problem than most people realize until they do some serious profiling.

A couple of solutions:
1. Preallocate. Use as little dynamic memory as possible.
2. Use a heap allocator specifically written for this purpose: http://freshmeat.net/projects/libhoard/?branch_id=4119&release_id=254349
3. Use task specific allocators so memory doesn't cross task boundaries.
4. Prevent priority inversion around the low priority threads doing memory ops.
son of parnas
Friday, June 15, 2007
 
 
> they don't usually take long and generally don't hang.

It really depends on the allocator, contention, and the priority of the tasks accessing the allocator. Memory allocation overhead can be one of the biggest reasons for poor performance depending on your app.
son of parnas
Friday, June 15, 2007
 
 
The default Windows process heap is protected by a critical section (initialized with a spin count).  The worst case scenario here is:
 1. spin count "expires"
 2. context switch to block on critical section

My thinking is that avoiding the context switch could be important in a multithreaded app.

Thanks, s.o.p
Meganonymous Rex
Friday, June 15, 2007
 
 
I've tried Hoard in the past, and was disappointed -- it did not appear to give much gain in terms of speed, and it used a LOT more memory.  YMMV...

I've not yet tried tcmalloc, which is from google, but it looks promising...

http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html

If you get some results, please post back or drop me an email .
BillAtHRST Send private email
Friday, June 15, 2007
 
 
If you have measured this to be a significant problem for you, the easiest thing to try is to just license SmartHeap (http://www.microquill.com/) - it has a drop-in replacement heap that among other things has been tuned for multithreaded perf.
Michael G
Friday, June 15, 2007
 
 
After I came across with this solution, my functions started taking microseconds where before it was taking milliseconds.

This is the best solution that I have found so far, I didn't invent this solution, but read it somewhere, forgot where:

Create 32 heaps.
Divide them into 2 groups of 16 each.
Now you have 16 pairs of heaps.

Now when a thread is created assign a thread number (or you can hash the thread id) from 1 to 16 one by one.

Now the rule is a thread can allocate only on its pair of heaps but it can deallocate pointers passed to it from any heap.  Most allocations and deallocations occur on the same thread so the thread contention for the heaps will be minimum.

The purpose of the heap pair:

Keep a counter for each pair that will count the number of allocations done on that heap.  Now when that number reaches 100000 or so switch the heap with its corresponding pair.  The purpose of this is for automatic defragmentation of the heaps.  After another 100000 allocations almost all pointers allocated on the first heap would have been freed giving you a pristine heap from which to start allocating.
Donald Duck
Friday, June 15, 2007
 
 
I forgot to add you must add a struct of bytes to all allocations to tell you from which heap to deallocate a pointer.

struct MYHEAPSTRUCT{
  int m_heapno; // 1 to 32
  int m_paddingfor8bytes;
};

void* myallocate( size_t size )
{
  ThisThreadData* td = GetThreadData();
  int heapno = td->m_currentheapno;
  int allocount = td->m_alloccount++;
  if( allocount>100000 ){
    td->m_currentheapno = heapno = (heapno+16)%32;
    td->m_alloccount = 0;
  }
  void* ptr = AllocateFromHeap( heapno, size+sizeof(MYHEAPSTRUCT) );
  ((MYHEAPSTRUCT*)ptr)->m_heapno = heapno;
  return ((BYTE*)ptr)+sizeof(MYHEAPSTRUCT);
}

void myfree( void* ptr )
{
    MYHEAPSTRUCT* mhs = ((BYTE*)ptr)-sizeof(MYHEAPSTRUCT);
    FreeFromHeap( mhs->m_heapno, mhs );
}
Donald Duck
Friday, June 15, 2007
 
 
I think I read the technique in this blog, but I am not locate the post now:

 http://blogs.msdn.com/ricom/
Donald Duck
Saturday, June 16, 2007
 
 
Donald Duck
Saturday, June 16, 2007
 
 
Donald Duck: Thanks for that post.  It's sort of like what I had come up with on my own, but I hadn't done the homework to compare peformance yet.  Hopefully I can do some experiments this weekend and report back with what I find.
Meganonymous Rex Send private email
Saturday, June 16, 2007
 
 
My initial tests with the method described by Donald Duck had promising results.  A program running with 8 threads and 8 pairs of heaps was roughly twice as fast as a program running against the standard heap.
Meganonymous Rex Send private email
Monday, June 18, 2007
 
 
Try this paper:
http://www.cs.utah.edu/~eeide/compilers/papers/pldi04-michael.pdf

I believe there is an implementation named NBMalloc.  Perhaps this would be a more scalable choice for the future.

Monday, June 18, 2007
 
 
If I remember correctly, the .net server GC assigns a process a heap section per CPU. This both speeds up memory allocations, but also GCs, since that's also performed by each CPU.
IIRC, the current Delphi memory manager is thread aware, and each thread gets its own local heap to allocate memory from. You can still get contention if thread a releases memory allocated by thread b, but that's relatively rare.
Mike Swaim Send private email
Monday, June 18, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz