Bug#692053: [PATCH] ia64 Iceweasel 10.0 (and above?) randomly stops responding, eating 100% CPU

Stephan Schreiber info at fs-driver.org
Sun Dec 16 08:20:01 UTC 2012


found 692053 10.0.11esr-1
notfound 692053 10.0~b2-1
# that version is over; nobody wants to patch it anymore, I guess
severity 692053 serious
tags 692053 + patch
thanks


I could reproduce the bug on Wheezy.

Iceweasel also needs a patch for the address range problem on ia64  
again since the upstream development has removed it in the meantime. I  
filed the separate bug#696041 for that.
After applying the patch of bug#696041, Iceweasel still used to hang.

It turned out that the garbage collector has two bugs that can cause  
an endless loop.

The problem is in js/src/jsgc.cpp:2385 in the function DecommitFreePages():

static void
DecommitFreePages(JSContext *cx)
{
     JSRuntime *rt = cx->runtime;

     for (GCChunkSet::Range r(rt->gcChunkSet.all()); !r.empty();  
r.popFront()) {
         Chunk *chunk = r.front();
         while (chunk) {
             ArenaHeader *aheader =  
static_cast<ArenaHeader*>(chunk->info.freeArenasHead);

             /*
              * In the non-failure case, the list will be gone at the end of
              * the loop. In the case where we fail, we relink all failed
              * decommits into a new list on freeArenasHead.
              */
             chunk->info.freeArenasHead = NULL;

             while (aheader) {
                 /* Store aside everything we will need after decommit. */
                 ArenaHeader *next = aheader->next;

                 bool success = DecommitMemory(aheader, ArenaSize);
                 if (!success) {
                     aheader->next = chunk->info.freeArenasHead;
                     chunk->info.freeArenasHead = aheader;
                     continue;
                 }

                 size_t arenaOffset =  
Chunk::arenaIndex(reinterpret_cast<uintptr_t>(aheader));
                 chunk->decommittedArenas.set(arenaOffset);
                 --chunk->info.numArenasFreeCommitted;
                 --rt->gcNumFreeArenas;

                 aheader = next;
             }

             chunk = chunk->info.next;
         }
     }
}


This code calls a function DecommitMemory(); ArenaSize is constant  
4096, defined in js/src/jsgc.h:

const size_t ArenaShift = 12;
const size_t ArenaSize = size_t(1) << ArenaShift;
const size_t ArenaMask = ArenaSize - 1;

The DecommitMemory() is in js/src/jsgcchunk.cpp:

bool
DecommitMemory(void *addr, size_t size)
{
     JS_ASSERT(uintptr_t(addr) % 4096UL == 0);
     int result = madvise(addr, size, MADV_DONTNEED);
     return result != -1;
}




When I debugged the hanging Iceweasel, a thread was in the most inner  
while loop in DecommitFreePages()

             ArenaHeader *aheader =  
static_cast<ArenaHeader*>(chunk->info.freeArenasHead);

             /*
              * In the non-failure case, the list will be gone at the end of
              * the loop. In the case where we fail, we relink all failed
              * decommits into a new list on freeArenasHead.
              */
             chunk->info.freeArenasHead = NULL;

             while (aheader) {
                 /* Store aside everything we will need after decommit. */
                 ArenaHeader *next = aheader->next;

                 bool success = DecommitMemory(aheader, ArenaSize);
                 if (!success) {
                     aheader->next = chunk->info.freeArenasHead;
                     chunk->info.freeArenasHead = aheader;
                     continue;
                 }

                 size_t arenaOffset =  
Chunk::arenaIndex(reinterpret_cast<uintptr_t>(aheader));
                 chunk->decommittedArenas.set(arenaOffset);
                 --chunk->info.numArenasFreeCommitted;
                 --rt->gcNumFreeArenas;

                 aheader = next;
             }

aheader was always 0x70011e0100 on each loop iteration since  
aheader->next was 0x70011e0100, too.
The question was how this state could occur.

Assume that we are in the first iteration of the loop. aheader is  
initialized with the ptr of chunk->info.freeArenasHead.
DecommitMemory is called, for example, with the address 0x70011e0100  
and the size 4096 (which is ArenaSize).
DecommitMemory() will fail since madvise() requires a page aligned  
addr and length

int madvise(void *addr, size_t length, int advice);

(madvise() returns the EINVAL error code.)

Neither the seen address 0x70011e0100 nor the length 4096 is page  
aligned here. Remember that Linux on ia64 can have 4K, 8K, 16K, or 64K  
page dependant on the configuration the Kernel was compiled with.  
Debian uses 16K on ia64.
So the subsequent if statement executes its body:

                 if (!success) {
                     aheader->next = chunk->info.freeArenasHead;
                     chunk->info.freeArenasHead = aheader;
                     continue;
                 }

Here happens what the above comment explains: "...we relink all failed  
decommits into a new list on freeArenasHead.".
aheader is inserted as the new head item in the singly linked list  
which starts with chunk->info.freeArenasHead. Please note that  
chunk->info.freeArenasHead has been cleared above (initialized with  
NULL).
aheader->next is initialized with chunk->info.freeArenasHead, which is  
NULL this time; chunk->info.freeArenasHead is aheader subsequently.  
Not any problem has occurred so far.
The continue statement enters the next iteration of the while loop -  
without running the final
		aheader = next;
DecommitMemory() is called with the same aheader again and of course  
fails again. The body of the if statement runs for the aheader again.  
This time aheader->next is initialized with chunk->info.freeArenasHead  
which isn't NULL this time; it is the former aheader.
aheader->next is equal to aheader after that. The hang up is complete.


So the considered code has two bugs:
- The garbage collector code gives madvise() non-page-aligned  
addresses and lengths since the code assumes 4K page size in  
hard-coded manner. Thus, the problem doesn' occur, for example, on x86  
or amd64.
- The while loop doesn't correctly iterate forward in the case of a  
fail of the memory decommit.

Initially I wanted the garbage collector code use the actual memory  
page size, for example, the constants ...
const size_t ArenaShift = 12;
const size_t ArenaSize = size_t(1) << ArenaShift;
const size_t ArenaMask = ArenaSize - 1;

/*
  * This is the maximum number of arenas we allow in the FreeCommitted state
  * before we trigger a GC_SHRINK to release free arenas to the OS.
  */
const static uint32 MaxFreeCommittedArenas = (32 << 20) / ArenaSize;

/*
  * The mark bitmap has one bit per each GC cell. For multi-cell GC things this
  * wastes space but allows to avoid expensive devisions by thing's size when
  * accessing the bitmap. In addition this allows to use some bits for colored
  * marking during the cycle GC.
  */
const size_t ArenaCellCount = size_t(1) << (ArenaShift - Cell::CellShift);
const size_t ArenaBitmapBits = ArenaCellCount;
const size_t ArenaBitmapBytes = ArenaBitmapBits / 8;
const size_t ArenaBitmapWords = ArenaBitmapBits / JS_BITS_PER_WORD;

... should no longer be constants. I think it would be much work.

Furthermore, I found the following code in js/src/jsgc.h:152

     /*
      * To minimize the size of the arena header the first span is encoded
      * there as offsets from the arena start.
      */
     static size_t encodeOffsets(size_t firstOffset, size_t lastOffset) {
         /* Check that we can pack the offsets into uint16. */
         JS_STATIC_ASSERT(ArenaShift < 16);
         JS_ASSERT(firstOffset <= ArenaSize);
         JS_ASSERT(lastOffset < ArenaSize);
         JS_ASSERT(firstOffset <= ((lastOffset + 1) & ~size_t(1)));
         return firstOffset | (lastOffset << 16);
     }

The
         JS_STATIC_ASSERT(ArenaShift < 16);
means that it wouldn't work with 64K memory page size. I guess my idea  
would mean a complete rewrite of the code.

Thus, a simplier fix:
- The while loop is fixed in order to iterate correctly in the case of  
a fail of the memory decommit.
- The code continues using 4K page size in hard-coded manner. Since  
all memory decommits would fail, the chunk->info.freeArenasHead singly  
linked list would contain all the items it had before. Thus, we insert  
at the begin of the DecommitFreePages() function (the one with the  
while loop) the following code which skips the loops when the memory  
page size is larger than 4K:
     /*
      * If the ArenaSize is smaller than a memory page, we don't attempt to
      * decommit anything because DecommitMemory() would always fail.
      */
     if (ArenaSize < pageSize())
         return;


The drawback is that Iceweasel would keep more memory (just as earlier  
versions of Iceweasel) on ia64. I think it isn't that bad.


You also need the patches of bug#696041.


Stephan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-gc-hang-on-large-pages.patch
Type: application/octet-stream
Size: 2441 bytes
Desc: fix-gc-hang-on-large-pages.patch
URL: <http://lists.alioth.debian.org/pipermail/pkg-mozilla-maintainers/attachments/20121216/16c4dcdb/attachment-0001.obj>


More information about the pkg-mozilla-maintainers mailing list