The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Memory fragmentation

I have a graphics processing utility (MFC-based) that, if used all day long, starts finding itself unable to allocate amounts of memory when there's plenty available. Since there's no memory leaks and total used memory isn't going up, I can only assume it's memory fragmentation leaving me unable to find a large enough chunk of contiguous memory.

Anyone had any experience of this before? What did you do to fix it?
Konfuzed Koder
Thursday, February 14, 2008
what call are you making to allocate memory?
Don Dickinson Send private email
Thursday, February 14, 2008
Are you using STL? That is notorious for fragmentation, try switching from vector to deque. Look at the swap trick to free extra memory. You can also write custom allocators.

If you are doing it yourself - the normal way is to allocate space for all the objects you will need yourself at the start and manage the collections yourself.
An easy way to do this is to have a flag inside the object for active/unused and use this to re-use objects rather than creating / deleting them.

If you are allocating/deleting large collections of objects look at the sequence - is it possible to force old or temp data to be deleted and go out of scope before you allocate the next block.

ps. If you find a good memory fragmentation viewer let me know!
Martin Send private email
Thursday, February 14, 2008
Your program likely has a memory leak or resource leak. Memory fragmentation will cause the program run slowly, but usually won't fail the allocation.
Thursday, February 14, 2008
What are you using to measure used memory?

To check for fragmentation, probably the easiest thing is to use a debug build. When you have run out of memory, make the program call _CrtDumpMemoryLeaks. This outputs a bunch of stuff to the debugger; save it to a file, then write a program to analyze the result.

Another alternative is to use _CrtMemCheckpoint. The _CrtMemState structure includes a pointer to the first block in the heap; combined with a quick peek at the corresponding library source code (dbgheap.c) you should be able to work out how to walk this list. The analysis you are performing on the file above would then be done in situ.

In any case, you'd simply look at the used blocks, make a note of their address and size, then (taking into account sizes of block headers etc.) infer the size and location of the free areas from that. (This doesn't handle the case where there are holes in the address space, but if you've fallen foul of fragmentation then there should be none anyway.)

I haven't used this exact technique myself, as I've never fallen foul of fragmentation on PC. However the above may at least provide a starting point, and, if not, you can just ignore it.
Thursday, February 14, 2008
I also agree with Glitch in that fragmentation is unlikely and that it is probably a memory leak. However that said MFC does heap-allocate an awful lot, so the likelihood of creating a heap that is fragmented in a way that the virtualized addressing can't solve is marginally higher.

Without more info about the program it's difficult to say though.
Thursday, February 14, 2008
It depends what you are doing - if you are creating a deleting a few database handles in CRUD app then you don't need to worry about fragmentation. If you are doing number crunching apps it's the major problem.

Say you allocate a vector of 10M objects then need to expand it to 10M+1 but you have since created a new object at the end of the first block of memory - it will allocate a new 20M block leaving the space of the first 10M unused.
Martin Send private email
Thursday, February 14, 2008
Memory fragmentation is possible under C++, and is also possible under managed code (pinned objects stop the GC compaction process from completing)

I would look at creating your own storage manager, or better, check around to see if someone sells one for MFC apps.
Thursday, February 14, 2008
Try making resource pools. Rather than deleting an allocated object, put it in the pool and reuse the next time such an object is required. Memory fragmentation should only really bite you for attempts to allocate large blocks of memory, so you only need to make pools for large objects.

As for how to do this specifically in C++/MFC, I'm not certain: you could make factory objects for each class you want to pool and then call a create object method on the factor rather than call new for the desired class. The factory would first see if there are any objects in the pool, and only resort to memory allocation (via new) if the pool is empty. You would also need to deallocate object via the factory object, rather than use delete. There may, however, be a better way to do this in C++ that I am not aware of.
Jeffrey Dutky Send private email
Thursday, February 14, 2008
"Your program likely has a memory leak or resource leak. Memory fragmentation will cause the program run slowly, but usually won't fail the allocation."

Not sure I agree with this. A memory leak or a resource leak should be easily detectable via Task Manager, right? If your app has a memory leak the available memory will fall as the amount of memory allocated to your process goes up. Same thing for resources except you look at the number of handles. I also disagree that fragmented memory will just cause the app to run slower. It is true that there will be more overhead to look for an adequately sized block to satisfy the request, but if there isn't a large enough block the allocation will fail and bring down the app.
Thursday, February 14, 2008
ŸRecommend getting the paper.

The memory fragmentation problem: solved?
By Mark S.Joghnstone and paul R wilson.

Basically implement your own memory pool. I think in windows xp you can reserve the address space up to 1 gigs. From there implement some of the common memory allocators, fist in, first out ect... Basically do that and you shouldn't have any problem with fragmentation of your program.

Most of the time, for a large class of programs, the fragmentation "problem" is really a problem of poor allocator implementations, and that for these programs, well known policies suffer from almost no true fragmentation.
Entity Send private email
Thursday, February 14, 2008
One solution is to use the Windows "Low Fragmentation Heap" (LFH); it reduces memory fragmentation at the expense of using more virtual memory space.

Note that unless you set an environment variable, the LF heap is disabled when you're in the debugger. It took me forever and a day to figure that one out.
Friday, February 15, 2008
Do something like this at startup (debug build)

#if (defined(_WIN32) && defined(_DEBUG))

#ifdef _WIN32
// set debug heap mgr to never actually delete memory, and to check heap on every call
int tmp = _CrtSetDbgFlag(_CRTDBG_REPORT_FLAG);
BillAtHRST Send private email
Friday, February 15, 2008
See HeapSetInformation for a way to progammatically turn on the Low Fragmentation Heap.

My code sets the CRT heap to use the Low Fragmentation stuff:

    HINSTANCE hKernel = LoadLibrary(_T("Kernel32.dll"));
    if(NULL != hKernel)
        HeapSetInformationProc hsip = (HeapSetInformationProc)GetProcAddress(hKernel, "HeapSetInformation");
        if(NULL != hsip)
            ULONG enable = 2;
            BOOL b = hsip(GetProcessHeap(), HeapCompatibilityInformation, &enable, sizeof(enable));
            DWORD gle = GetLastError();
            b = hsip((HANDLE)_get_heap_handle(), HeapCompatibilityInformation, &enable, sizeof(enable));
            gle = GetLastError();
Saturday, February 16, 2008
Some good ideas here, guys, thanks!

To answer a couple of points, it's not a memory leak, I'm sure of it.

Also, I don't want to preallocate the memory if I can help it, because that would impact the performance of the machine when my program doesn't need it.

I'm doing a graphics application which is routinely working with 2000x2000 RGBA pixels images, and I need to allocate buffers for that when I have to, and deallocate them when I don't need them.

The brute force option would be to work on smaller tiles of the images, but that would impact performance as well.

I'll try a couple of these suggestions, and let you know how I get on. Any other thoughts based on this new information would be great.
Konfused Koder
Saturday, February 16, 2008
In Linux/Unix I'd recommend doing your own anonymous mmap's to get the image memory.

It looks like Windows can do that too with VirtualAlloc.
Jonathan Briggs
Sunday, February 17, 2008
Using small tiles is actually likely to speed your app up as well as helping with fragmentation. This is because it improves data locality.

One scanline of a 2000 pixel wide RGBA image will use 8000 bytes which is almost two pages in virtual memory. That means that stepping down to the pixel below the current one has a good chance of hitting the swap file if you're short on RAM.

If you use say 64x64 tiles then each tile is 16k (4 pages). Stepping down through 128 pixels will hit 4-5 pages where it would hit 128 before.

Also because they are only 16k each they should fit entirely inside the L1 CPU cache. That should speed up most image processing that combines multiple pixels (resizing and rotating for example).

Depending on exactly what you do with the images you may find different tile sizes are quicker, so test a few to see what works best for you.
Monday, February 18, 2008
I don't get it, why do you "have" to keep allocating and deallocating the same size buffer? Why not reuse it?
Monday, February 18, 2008
Well, a 2000x2000 pixel space, with RGBA of 16 bits, is 8e6 bytes (8 megabytes).

I don't know how many of these he needs to allocate at one time -- I'd expect a minimum of 3 -- A Current, a Next, and a Work space.

Still, with a modern 2 gigabyte PC memory space, that's only 24 megabytes -- I'd think allocating them on first use, then keeping the space allocated and tossing the pointers (or references) around would be quite practical.

Certainly, continually allocating and deallocating 8 Megabytes could be quite time-consuming.
Tuesday, February 19, 2008
My bad, the 2000x2000 is a bit of a red herring, that's the end result of the calculations, which come from processes that require me to scale up the images by more than 3 times. I normally have to allocate 200-300MB.

Tiling into really small tiles would probably cost more time in moving memory around than it would save by fitting in the cache.

And I don't want to leave it allocated permanently, because better use can be made of the RAM, both from mine and other applications.
Konfused Koder
Tuesday, February 19, 2008
You could try allocating your big temporary blocks of memory using VirtualAlloc() and the MEM_TOP_DOWN flag.

What that does is puts them at the top of your address space. Other non-temporary allocations will start at the bottom of the address space and shouldn't fragment those big blocks.

I've used a similar method before to reduce fragmentation.

By the way I'd also recommend running a profiler like VTune over it. If you're getting lots of cache misses processing the images it might be worth trying the small tiles approach. It's used extensively by 3D graphics cards to speed up texture reads.
Wednesday, February 20, 2008
Why not reserve x blocks of  200-300mb in your address space (using VirtualAlloc with the reserve flag set but not the commit one) and then when needed commit memory to it ?

This way you are sure that there will be no problem allocating the space : it is done once and for all.

By not committing memory to the reserved region at once, you make sure that your physical ram is properly allocated elsewhere in your process.

When you need it, and for the time you need it, you can commit memory to your reserved region : Physical ram will get allocated to it and you will be able to do your work.
Once your done, you can decommit it but keep it reserved.
This way physical ram gets available for other tasks, but your are sure that the next allocation will be fast and will not fail.

Due to the size of your images, as said earlier, you may have to layout them in memory in a way that maximize locality for the algorithm you are using e.g. if you scan columns first, then you should arrange columns as rows in memory.
Wednesday, February 20, 2008
you could use two heaps and after the 100000th allocation you switch heaps.  By the next 100000th allocation almost all the allocs on the first heap would have become freed leaving you a pristine heap from which to start allocating again. automatic memory defragmentation without worrrying about creating your own heap manager.
Wednesday, March 12, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz